I have a gpu with about 2GB of available memory:
CUDADevice with properties: Name: 'Quadro K1100M' Index: 1 ComputeCapability: '3.0' SupportsDouble: 1 DriverVersion: 6.5000 ToolkitVersion: 6.5000 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [2.1475e+09 65535 65535] SIMDWidth: 32 TotalMemory: 2.1475e+09 AvailableMemory: 2.0154e+09 MultiprocessorCount: 2 ClockRateKHz: 705500 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1
However, I'd like to load a sparse array into it (R2015A, which supports sparse GPUarray):
whos('pxe')Name Size Bytes Class Attributes pxe 5282400x5282400 1182580904 double sparse, complex
I get an error upon trying to copy it to GPU though:
gpxe = gpuArray(pxe);Error using gpuArrayAn unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
I'm not sure what the problem here is? Trying it with smaller sized sparse arrays will work, but I'm still well within the memory limits here. Is there some kind of hidden maximum size, or is it that we are not allowed to actually use most of the GPU memory? This would theoretically take up less than 60% of GPU memory.
Edit: trying smaller arrays and loading multiple ones into GPU memory:
Trial>> gpu = gpuDevice;Trial>> mem1 = gpu.FreeMemory;Trial>> gpxe = gpuArray(pxet.');Trial>> mem2 = gpu.FreeMemory;Trial>> gpye = gpuArray(pyet.');Trial>> mem3 = gpu.FreeMemory;Trial>> gpxi = gpuArray(pxit.');Trial>> mem4 = gpu.FreeMemory;Trial>> gpyi = gpuArray(pyit.');Trial>> mem5 = gpu.FreeMemory;
Sizes of these arrays are theoretically:
whos('pxet','pyet','pxit','pyit')Name Size Bytes Class Attributes pxet 211600x211600 47266024 double sparse, complexpxit 211600x211600 47266024 double sparse, complexpyet 211600x211600 47266024 double sparse, complexpyit 211600x211600 47266024 double sparse, complex
Sequential memory footprint in the GPU:
Trial>> mem1-mem2 ans = 147456000 Trial>> mem2-mem3 ans = 39059456 Trial>> mem3-mem4 ans = 39059456 Trial>> mem4-mem5 ans = 39059456
So the very first one preallocates a huge chunk of memory, and subsequent ones take up less space than they should? Seems to me like I need to have enough GPU memory to fit the initial preallocation that's about 3 times as big as it needs to.
Best Answer