MATLAB: GpuArray sparse memory usage

gpuarrayParallel Computing Toolbox

I have a gpu with about 2GB of available memory:
CUDADevice with properties:
Name: 'Quadro K1100M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6.5000
ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
AvailableMemory: 2.0154e+09
MultiprocessorCount: 2
ClockRateKHz: 705500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
However, I'd like to load a sparse array into it (R2015A, which supports sparse GPUarray):
whos('pxe')
Name Size Bytes Class Attributes
pxe 5282400x5282400 1182580904 double sparse, complex
I get an error upon trying to copy it to GPU though:
gpxe = gpuArray(pxe);
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
I'm not sure what the problem here is? Trying it with smaller sized sparse arrays will work, but I'm still well within the memory limits here. Is there some kind of hidden maximum size, or is it that we are not allowed to actually use most of the GPU memory? This would theoretically take up less than 60% of GPU memory.
Edit: trying smaller arrays and loading multiple ones into GPU memory:
Trial>> gpu = gpuDevice;
Trial>> mem1 = gpu.FreeMemory;
Trial>> gpxe = gpuArray(pxet.');
Trial>> mem2 = gpu.FreeMemory;
Trial>> gpye = gpuArray(pyet.');
Trial>> mem3 = gpu.FreeMemory;
Trial>> gpxi = gpuArray(pxit.');
Trial>> mem4 = gpu.FreeMemory;
Trial>> gpyi = gpuArray(pyit.');
Trial>> mem5 = gpu.FreeMemory;
Sizes of these arrays are theoretically:
whos('pxet','pyet','pxit','pyit')
Name Size Bytes Class Attributes
pxet 211600x211600 47266024 double sparse, complex
pxit 211600x211600 47266024 double sparse, complex
pyet 211600x211600 47266024 double sparse, complex
pyit 211600x211600 47266024 double sparse, complex
Sequential memory footprint in the GPU:
Trial>> mem1-mem2
ans =
147456000
Trial>> mem2-mem3
ans =
39059456
Trial>> mem3-mem4
ans =
39059456
Trial>> mem4-mem5
ans =
39059456
So the very first one preallocates a huge chunk of memory, and subsequent ones take up less space than they should? Seems to me like I need to have enough GPU memory to fit the initial preallocation that's about 3 times as big as it needs to.

Best Answer

The first time you start up any of the GPU support within MATLAB, a series of libraries are loaded, and these consume memory on the GPU. Sparse gpuArray uses a different representation compared to the CPU (it uses CSR layout, and 4-byte integers for indices) which explains why the number of bytes consumed by a given sparse matrix is different on the GPU and the CPU. Converting between these formats requires additional storage on the GPU, which almost certainly explains why you cannot create the large sparse matrix on the GPU.