I have trained networks (trainNetwork()) on my GPU with MATLAB R2018b for over a year without any issues.
Since when I upgraded to MATLAB R2020b, I've only been able to train small networks. The same script that would run flawlessly in R2018b with an arbitrarily large number of units (e.g., n = 2000), in R2020b works up until n = 50, and then crashes for (n > 100).
The reported error is typically:
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:CUDA_ERROR_LAUNCH_FAILED Error using trainNetwork (line 183)Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED. Error in RNNprediction (line 170)net = trainNetwork({traind.x}, {traind.y}, layers, options);
The crash happens between the 2nd and 5th training iteration. When this happens, I have to restart MATLAB in order to be able to do any training at all since reset(gpuDevice) also fails and returns:
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:CUDA_ERROR_LAUNCH_FAILED Error using parallel.gpu.CUDADevice/resetAn unexpected error occurred during CUDA execution. The CUDA error was:all CUDA-capable devices are busy or unavailable
Training of the same network runs smoothly on CPU (although very slowly).
NOTE: I have already increased the WDDM TDR Delaty to 60, but nothing has changed. I have also tried disabling altoghether the TDR with no success.
Here are some CUDA properties:
>> gpuDeviceans = CUDADevice with properties: Name: 'GeForce RTX 2070' Index: 1 ComputeCapability: '7.5' SupportsDouble: 1 DriverVersion: 10.2000 ToolkitVersion: 10.2000 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [2.1475e+09 65535 65535] SIMDWidth: 32 TotalMemory: 8.5899e+09 MultiprocessorCount: 36 ClockRateKHz: 1620000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1
Best Answer