Hello, i made a simple cuda kernel to measure global memory transfer speed to the cuda processors:
__global__ void SR2add(float* dataout,const float* datain,int size) { int mindex=blockIdx.x*blockDim.x+threadIdx.x; if (mindex>=size) return; dataout[mindex]=datain[mindex]; }
The matlab function i wrote for it:
function GPU_MemBandTest() import parallel.gpu.GPUArray xsize=1024; ysize=768; vectorsize=xsize*ysize; threadpblock=1024; k=parallel.gpu.CUDAKernel('MemBandTest.ptx', 'MemBandTest.cu'); k.ThreadBlockSize=[threadpblock,1,1]; k.GridSize=[ceil(vectorsize/threadpblock),1]; ddatain=parallel.gpu.GPUArray.zeros(vectorsize,1,'single'); dataout=rand(vectorsize,1,'single'); ddataout=GPUArray(dataout); tic for i=1:1000 [ddataout]=feval(k,ddataout,ddatain,vectorsize); end time=toc; disp(['ms time= ' num2str(time)]) disp([num2str(vectorsize*4/(time*10^6)) 'GB/s']) end
I got ms time= 0.73629 and 4.2724GB/s result for that. I would like to ask: 1; that am i doing correctly the measurement? 2; Is there anything i can do to speed up this simple code or this is an expectable result for this kernel in matlab?
I have MATLAB R2011a, CUDA Toolkit 3.2, gt425m device, newest driver installed for it
If I use float* datain instead of const float* datain, the execution time goes up to 2.4ms
3; What could be the explanation of this?
Thanks for anyone who helps,
Gaszton
Best Answer