MATLAB: GPU CUDA kernel malloc error

compute capability 2.1cudagpuParallel Computing Toolbox

Hello, i have a geforce 425m card with compute capability 2.1 I wrote a kernel that is using malloc inside the kernel. First the ptx file didnot compiled. After I tried to set the nvcc parameter arch=sm_21 ( nvcc -I "D:\…VC\include" -arch=sm_21 -use_fast_math -ptx SR2.cu ) With this it compiled succesfully, i was just wondering why do i need the specify that. After that i tried to create the kernel in matlab:
ckernel=parallel.gpu.CUDAKernel('SR2.ptx', 'SR2.cu');
But i a get the error:
??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of <image>.
The information log was:
: Considering profile 'compute_20' for gpu='sm_21' in
'cuModuleLoadDataEx_2a9
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE.
Before modifying the kernel to use malloc, and not specifying nvcc arch=sm_21, i was able to run my kernel from MATLAB without any problem.
I think that there is some configuration problem with CUDA. I hope someone has some idea how to solve this.
Thanks,
Gaszton

Best Answer

You can get that error message if you have a mismatch between the CUDA runtime in use by Parallel Computing Toolbox and the version of nvcc that you're using. If you're using R2010b, you need to use CUDA-3.1; for R2011a, you can use CUDA-3.2. I was able to compile and use the following trivial kernel:
// simple.cu
__global__ void fcn( double * out ) {
int * x = (int *) malloc( 1024 );
out[0] = x[0];
free( x );
}
By compiling like so:
$ /usr/local/cuda32/cuda/bin/nvcc -arch compute_20 -ptx simple.cu
and then using within MATLAB R2011a like so:
>> k = parallel.gpu.CUDAKernel( 'simple.ptx' );
>> gather(k.feval(0))
ans =
1.768515945000000e+09