Hello all, I currently take advantage of a lot of Matlab's GPU enabled functions (matrix operations, FFTs, etc.) which provide great speed advantages over their CPU counterparts. I thought GPU matrix operations were fast…until I discovered a CPU/GPU/MEX-CUDA comparison for running Conway's game of life. On my machine, the MEX-CUDA version was ~50X faster than the GPU version. Or, using the CPU as a comparison
- CPU – 1X
- GPU – 7X
- MEX-CUDA – 350X
So, with those kind of speed gains, I am finally feeling motivated to learn some CUDA. However, what is not clear to me is: When is there a significant speed advantage in writing one's own CUDA kernel? There several narrower questions:
- I use GPU > CPU when I can frame an operation as a binary array operation (matrix additions and the like, NOT sorts). If I meet this criterion, is there a second criterion that says I should write my own CUDA code rather than doing everything inside Matlab and leveraging the built-in GPU enabled functions using e.g. arrayfuns?
- I assume that the GPU enabled functions for Matlab, like fft, interp1 (with a linear interpolant), exp, etc. are all already as accelerated as they get: I could not write a faster fft myself. Instead, it must be other problems that can be framed as binary array operations (like the stencil problem from the Life example) that would require special treatment. Is this true?
Finally, if someone has a nice starting place for CUDA with Matlab, I would appreciate the link; I know a bit of C, but the example files in the Life tutorial, such as pctdemo_life_mex_shmem.cu, are a little outside of my current skill set.
Cheers, Dan
Best Answer