You can get that error message if you have a mismatch between the CUDA runtime in use by Parallel Computing Toolbox and the version of nvcc that you're using. If you're using R2010b, you need to use CUDA-3.1; for R2011a, you can use CUDA-3.2. I was able to compile and use the following trivial kernel:
// simple.cu
__global__ void fcn( double * out ) {
int * x = (int *) malloc( 1024 );
out[0] = x[0];
free( x );
}
By compiling like so:
$ /usr/local/cuda32/cuda/bin/nvcc -arch compute_20 -ptx simple.cu
and then using within MATLAB R2011a like so:
>> k = parallel.gpu.CUDAKernel( 'simple.ptx' );
>> gather(k.feval(0))
ans =
1.768515945000000e+09
Best Answer