MATLAB: Are calculations on a GPU using the Parallel Computing Toolbox in MATLAB R2017b executed with double precision

MATLAB

I am using the Parallel Computing Toolbox in MATLAB R2017b, and I want to know if I can perform calculations on my GPU with double precision.

Best Answer

Yes, MATLAB will perform all calculations on your GPU  in double precision. You can use GPUs with MATLAB through Parallel Computing Toolbox , which support *CUDA-enabled NVIDIA GPUs* with compute capability 2.0 or higher. Note that in a future release, support for GPU devices of compute capability 2.x will be removed, and compute capability 3.0 will be required. See the following link for further information: https://mathworks.com/discovery/matlab-gpu.html
Please be aware that compute capability refers more to the architecture / feature set of a GPU card than its actual performance in double or single precision. For performance, you can refer to this Wikipedia page (<https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units>) which has the double vs single precision performance of pretty much every NVIDIA card out there.
The following code is a short "GPU double precision" example. Therein, matrices of random numbers are created on the CPU workspace, allocated on the GPU for subsequent computation before the outcome is re-transferred to the CPU
rng(42)
A = randn(5);
B = randn(5);
Agpu = gpuArray(A);
Bgpu = gpuArray(B);
% GPU loop
for i = 1:10
Bgpu = Bgpu + Agpu * Agpu;
end
Bdouble = gather(Bgpu);
The resulting data types can be identified by using the command "whos":
>> whos
  Name         Size            Bytes  Class       Attributes
  A            5x5               200  double                
  Agpu         5x5                 4  gpuArray              
  B            5x5               200  double                
  Bdouble      5x5               200  double                
  Bgpu         5x5                 4  gpuArray              
  i            1x1                 8  double 
One way of double check the precision, is the comparison of matrix multiplications performed on CPU vs GPU. These will produce the same results within a double precision tolerance. According to the IEEE-754 standard, double precision has 52 bits for the mantissa, which may result in a quantization error of eps = 2 ^ -52 = 2.2 *10 ^ -16. Note that the "eps" is added and multiplied throughout this calculation, and will be accumulated in the total delta. 
% CPU loop
for i = 1:10
B = B + A * A;
end
% difference
delta = B - Bdouble
delta =
   1.0e-14 *
         0         0         0         0    0.2665
   -0.1776         0   -0.3553         0         0
         0   -0.2387         0         0         0
         0         0         0    0.0222         0
         0   -0.0888         0         0         0