MATLAB: Is Titan V training performance so poor

titan v

I wanted to speed up my neural network training so upgraded from a GTX1080 to a Titan V expecting a large increase in performance due to improved architecture, memory speed, etc.

Well, the 1080 is crushing the Titan V.

Transfer learning on alexnet and training on the same pool of images with identical settings

 opts = trainingOptions('sgdm','InitialLearnRate',0.001, 'Plots', 'training-progress', 'MiniBatchSize', 512)

the Titan moves at approximately 164 seconds per iteration while the 1080 is cruising at a 62 seconds per iteration.

I'm flabbergasted that a GPU that is outclassed in every way somehow manages to win by such a large margin.

Does anyone have a similar experience or any explanation for why this might be happening?

Thanks in advance.

Best Answer

I am posting here the same information with which I responded to your tech support request. Perhaps others will find this useful.

In my tests of transfer learning AlexNet, the Titan V was 5x faster than the K20c, 2x faster than the GTX1080 (same series but faster than the 970) and 1.3x faster than the Titan XP. This was running R2017b Updates 2 and 4.

GeForce cards on Windows in WDDM mode are significantly affected by the OS's supervisory interference, particularly when it comes to the speed of memory allocation. This makes them much slower for certain functionality that requires a lot of memory allocation than on Linux. The Titan V, which is very new and does not yet have fully optimised drivers, seems to be particularly affected by this.

The solution is to put the Titan V into TCC mode. You will need to drive your graphics from another GPU or on-board graphics. Go to C:\Program Files\NVIDIA Corporation\NVSMI and run

nvidia-smi

to find out which GPU is your Titan V. Let us say it is GPU 1. Then type

nvidia-smi -i 1 -dm 1

and reboot.

In my own experiments I found that the Titan V was still slower on Windows for transfer learning of AlexNet than on Linux, but I do have a much slower CPU in my Windows machine, so it's probably just because of that. It may also be, as I say, that the Windows driver is not yet fully optimised - it is early days for the Titan V drivers.

An alternative work-around is to reduce the amount of raw allocation that is happening during training. You can either reduce the MiniBatchSize, or you can use a special Feature command to increase the amount of memory MATLAB is allowed to reserve:

>> feature('GpuAllocPoolSizeKb', intmax('int32'))

This has the side-effect of making MATLAB more likely to conflict with other applications using the GPU, but you can experiment with different pool sizes to find a balance. In WDDM mode you should see a considerable increase in performance due to the reduction in raw memory allocations, although in my experiments it didn't quite reach the performance of using TCC mode instead.

It's worth elaborating - you cannot judge how well a card will perform based entirely on raw computing power - all GPU algorithms require a combination of compute, memory i/o and CPU compute to function. GPUBench gives a reasonable indication of expected FLOPs for different kinds of algorithm, and Deep Learning is another kind of algorithm again.

MathWorks does not generally give hardware advice, so it is up to the customer to decide whether the Titan V is cost effective. Some things to take into consideration are:

The Titan cards (V and XP) can be put into TCC mode whereas the 970 and 1080 cannot.
The Titan cards support Remote Desktop when the card is not driving the display, the 970 and 1080 do not.
The Titan V has a Tensor Core, which means that when MATLAB supports half precision Deep Learning, its performance will greatly increase over the Pascal and Maxwell architectures.
The Titan V has excellent double-precision performance, unlike any other GeForce card. This means you can use it for other MATLAB functionality such as system modelling that requires the accuracy of double precision.

Hope this helps.

Related Solutions

MATLAB: Select a group gpu’s that are discontinuous in gpuDevice

There are two ways:

The best way is to set environment variable CUDA_VISIBLE_DEVICES to 0,2,3 before you start MATLAB, or as the first thing you do when you start MATLAB:

setenv('CUDA_VISIBLE_DEVICES', '0,2,3');

The second best way is to start a pool of 3 workers before you start training, and set the device manually:

parpool(3);
spmd
    if labindex == 1
        gpuDevice(1);
    else
        gpuDevice(labindex+1);
    end
end

A third way is to use your GTX970, but reduce the amount of work it has to do during training using the 'WorkerLoad' training option:

trainingOptions(..., 'WorkerLoad', [1, 0.1, 1, 1]);

The above will set the second worker to have a tenth of the data of the other workers. Note that this will not necessarily result in faster training than removing your 970 from the pool altogether. You'd have to experiment.

MATLAB: MATLAB R2019b does not use CUDA 10.1

Run nvidia-smi in a terminal window and see what driver version is reported. It sounds like you have a 10.0 driver and need to update it to 10.1.

Best Answer

Related Solutions

MATLAB: Select a group gpu’s that are discontinuous in gpuDevice

MATLAB: MATLAB R2019b does not use CUDA 10.1

Related Question