MATLAB: Do I get CUDA execution errors when training the network on a GPU

cudaexecutiongpukernelNetworkParallel Computing Toolboxsize;tcctdrtimeout

Why do I get the following error when training my neural network:
An unexpected error occurred during CUDA execution. The CUDA error was: all CUDA-capable devices are busy or unavailable
The above only happens on a GPU and not on the CPU.

Best Answer

We suspect that the most likely issue is a kernel execution timeout.
To confirm this you can try running some GPUarray commands, such as:
A = gpuArray(rand(10))
B = A+1
If the above runs without any warnings and errors, it is likely due to kernel timeouts.
Some possible workarounds:
  1. You have to scale down your problem to make sure it does not timeout (e.g. with a smaller network, or data size) or use a different card that does not timeout.
  2. Some GPUs allow one to set the compute mode to computations (TCC) only but others don't. As a possible workaround check if your GPU allows changing to that mode.
  3. Another possible workaround is to modify the registry to increase the TDR delay value as explained in the web page below: