MATLAB: CUDAKernel/feval error, cannot find ‘single’ input.

cudakernelfevalgpuarray

I am using the set of code found here: http://www.cs.nyu.edu/~deigen/rain/
Here is the function that I am stuck on:
function patches = im2col_gpu(X, psize)
global config data;
data.gpu.float_t = 'double';
ker = get_kernel('im2col', 'im2col.cu');
im_ni = size(X, 1);
im_nj = size(X, 2);
nimgs = numel(X) / (im_ni * im_nj);
p_ni = psize(1);
p_nj = psize(2);
npatches = nimgs * (im_ni - p_ni + 1) * (im_nj - p_nj + 1);
ker = set_grid(ker, npatches);
patches = gpuArray.zeros(p_ni * p_nj, npatches, data.gpu.float_t);
***patches = feval(ker.ker, X, patches, ...
im_ni, im_nj, nimgs, ...
p_ni, p_nj, npatches);
The asterisks denote the line of code experiencing the error…
This is designed to take an input image and return the image after inpainting the dirt that the network finds.
When I try to run the dirt_example file, an error is called inside of im2col_gpu.m:
Error using parallel.gpu.CUDAKernel/feval
The type of the array supplied for argument 1 does not match the kernel prototype. The kernel prototype specifies
that the input must be an array of 'double' data but the input supplied was an array of type 'single'.
Error in im2col_gpu (line 19)
patches = feval(ker.ker, X, patches, ...
Error in denoise_image (line 48)
bpatches(:,c,:) = reshape(im2col_g(bx(:, :, c), [pi, pj]), ...
Error in dirt_example (line 13)
restored = denoise_image(original, net);
So essentially, the feval() function designed for evaluating kernels is returning an error because it is expecting a double and receiving a single. However, when I debug this function, I find that all of the input variables are:
ker.ker = kernel struct
X = <1783x16 gpuArray>
patches = <256x1768 gpuArray>
And all of the following input variables (im_ni, im_nj, nimgs, p_ni, p_nj, and npatches) are all doubles. There is no input variable that is a single.
This is the first time that I have worked with CUDA at all so I hope I am overlooking something very simple. Please let me know if there's any other relevant information that I can provide. Thanks for your time.

Best Answer

I presume if you ask
classUnderlying(patches)
it'll probably tell you they are single. After all, you have just declared it with the type data.gpu.float_t which is probably single.
Maybe you should change your kernel to take single precision data? It is faster and much more common to process images in single precision.