Given the miner code I plan on parallelising the code where the miner search for the number. I currently do research in Parallel and Distributed Computing so I'm comfortable in OpenCL/CUDA which ever is your preference it would be done.
The challenge is to write the kernel code in optimized way in terms of memory.