MATLAB: CUDA_ERROR_UNKNOWN when using floats instead of double precision

cudaParallel Computing Toolbox

Hello,
I have a identical .cu files, one where I use variables defined as float and another where I use variables defined as double.
The double precision one works perfectly when called on by the kernel, whereas the float version does not. I get an error when I gather() the output variables:
Errror using gpuArray/gather
An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_UNKNOWN
From the documentation, it appears that the feval() function will automatically cast my input arrays to the correct type, however, I have also tried individually transforming each input/output array to float using single(), but I get a similar error.
Here is the format:
__global__ void SegForceNBodyCUDA(double const *SoA,
double const a, double const MU, double const NU,
int const S,
double *f0x, double *f0y, double *f0z,
double *f1x, double *f1y, double *f1z);
or
__global__ void SegForceNBodyCUDA(float const *SoA,
float const a, float const MU, float const NU,
int const S,
float *f0x, float *f0y, float *f0z,
float *f1x, float *f1y, float *f1z);
Both .cu files compile correctly without errors/warnings.
Please advise.
Thank you,
Francesco

Best Answer

Thanks for sending the code.
I’ve done some initial investigation and it looks like you have an illegal memory access somewhere. Here is what cuda-memcheck reports:
Running CUDA Single Precision, Optimised...
warning: Cuda API error detected: cuModuleGetGlobal_v2 returned (0x1f4)
warning: Cuda API error detected: cuModuleGetGlobal_v2 returned (0x1f4)
[Launch of CUDA Kernel 102 (SegForceNBodyCUDA<<<(4,1,1),(256,1,1)>>>) on Device 0]
Memcheck detected an illegal access to address (@local)0xfff830
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 102, grid 103, block (0,0,0), thread (5,0,0), device 0, sm 12, warp 2, lane 5]
0x0000000010052d98 in SegForceNBodyCUDA(float const*, float, float, float, int, float*, float*, float*, float*, float*, float*) ()
I couldn't see anything obviously wrong in the kernel, but it's quite a lot of code. The most likely culprit is reading/writing past the end of an input/output array. However, the fact that the illegal memory address includes "@local" may indicate a problem with how data is being passed around internally to the kernel (i.e. in thread-local memory). I don't think there is any problem with the way the kernel is being called by MATLAB.
Best of luck debugging this - these type of problems can be tricky to isolate.
Ben