MATLAB: Neural networks – CUDAKernel​/setConsta​ntMemory – the data supplied is too big for constant ‘hintsD’

Deep Learning ToolboxgpuMATLABmemoryneural networkParallel Computing Toolbox

On R2015a with Parallel Computing Toolbox and Neural Network Toolbox.
Using the following code with GPU Nvidia GeForce GTX980 Ti:
net1 = feedforwardnet(20);
net1.trainFcn = 'trainscg';
x = inputs(1:4284,2:2000)'; % if I reduce this to 2:1900, it will work
t = double(targets'); % casting to double for GPU
t = t(:,1:4284);
% preparing for GPU xg = nndata2gpu(x); tg = nndata2gpu(t);
net1.input.processFcns = {'mapminmax'}; net1.output.processFcns = {'mapminmax'};
net2 = configure(net1,x,t); % Configure with MATLAB arrays
net2 = train(net2,xg,tg);
As you can see, this is not a big dataset. When I run this, it generates this error:
Error using parallel.gpu.CUDAKernel/setConstantMemory The data supplied is too big for constant 'hintsD'.
Error in nnGPU.codeHints (line 33) setConstantMemory(hints.yKernel,'hintsD',hints.double);
Error in nncalc.setup2 (line 13) calcHints = calcMode.codeHints(calcHints);
Error in nncalc.setup (line 17) [calcLib,calcNet] = nncalc.setup2(calcMode,calcNet,calcData,calcHints);
Error in network/train (line 357) [calcLib,calcNet,net,resourceText] = nncalc.setup(calcMode,net,data);
gpuDevice is showing this:
Name: 'GeForce GTX 980 Ti'
Index: 1
ComputeCapability: '5.2'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4425e+09
AvailableMemory: 5.1520e+09
MultiprocessorCount: 22
ClockRateKHz: 1139500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
As noted in the code above, if I reduce x marginally, it will run.
I don't understand why data of this size would generate a memory error?
Am I missing a step in preparing this for GPU?

Best Answer

I was able to reproduce your issue. The best solution is to do the GPU training a different way by using the 'useGPU' flag. This does not use the shared memory in this way, and side-steps this issue. Your example code would look like this:
net1 = feedforwardnet(20);
net1.trainFcn = 'trainscg';
x = inputs(1:4284,2:2000)';
t = double(targets'); % casting to double for GPU
t = t(:,1:4284);
net1.input.processFcns = {'mapminmax'};
net1.output.processFcns = {'mapminmax'};
net1 = train(net1,x,t,'useGPU','yes');
Related Question