I wanted to speed up my neural network training so upgraded from a GTX1080 to a Titan V expecting a large increase in performance due to improved architecture, memory speed, etc.
Well, the 1080 is crushing the Titan V.
Transfer learning on alexnet and training on the same pool of images with identical settings
opts = trainingOptions('sgdm','InitialLearnRate',0.001, 'Plots', 'training-progress', 'MiniBatchSize', 512)
the Titan moves at approximately 164 seconds per iteration while the 1080 is cruising at a 62 seconds per iteration.
I'm flabbergasted that a GPU that is outclassed in every way somehow manages to win by such a large margin.
Does anyone have a similar experience or any explanation for why this might be happening?
Best Answer

I am posting here the same information with which I responded to your tech support request. Perhaps others will find this useful.
In my tests of transfer learning AlexNet, the Titan V was 5x faster than the K20c, 2x faster than the GTX1080 (same series but faster than the 970) and 1.3x faster than the Titan XP. This was running R2017b Updates 2 and 4.
GeForce cards on Windows in WDDM mode are significantly affected by the OS's supervisory interference, particularly when it comes to the speed of memory allocation. This makes them much slower for certain functionality that requires a lot of memory allocation than on Linux. The Titan V, which is very new and does not yet have fully optimised drivers, seems to be particularly affected by this.
The solution is to put the Titan V into TCC mode. You will need to drive your graphics from another GPU or on-board graphics. Go to C:\Program Files\NVIDIA Corporation\NVSMI and run
to find out which GPU is your Titan V. Let us say it is GPU 1. Then type
nvidia-smi -i 1 -dm 1
and reboot.
In my own experiments I found that the Titan V was still slower on Windows for transfer learning of AlexNet than on Linux, but I do have a much slower CPU in my Windows machine, so it's probably just because of that. It may also be, as I say, that the Windows driver is not yet fully optimised - it is early days for the Titan V drivers.
An alternative work-around is to reduce the amount of raw allocation that is happening during training. You can either reduce the MiniBatchSize, or you can use a special Feature command to increase the amount of memory MATLAB is allowed to reserve:
>> feature('GpuAllocPoolSizeKb', intmax('int32'))
This has the side-effect of making MATLAB more likely to conflict with other applications using the GPU, but you can experiment with different pool sizes to find a balance. In WDDM mode you should see a considerable increase in performance due to the reduction in raw memory allocations, although in my experiments it didn't quite reach the performance of using TCC mode instead.
It's worth elaborating - you cannot judge how well a card will perform based entirely on raw computing power - all GPU algorithms require a combination of compute, memory i/o and CPU compute to function. GPUBench gives a reasonable indication of expected FLOPs for different kinds of algorithm, and Deep Learning is another kind of algorithm again.
MathWorks does not generally give hardware advice, so it is up to the customer to decide whether the Titan V is cost effective. Some things to take into consideration are:
  1. The Titan cards (V and XP) can be put into TCC mode whereas the 970 and 1080 cannot.
  2. The Titan cards support Remote Desktop when the card is not driving the display, the 970 and 1080 do not.
  3. The Titan V has a Tensor Core, which means that when MATLAB supports half precision Deep Learning, its performance will greatly increase over the Pascal and Maxwell architectures.
  4. The Titan V has excellent double-precision performance, unlike any other GeForce card. This means you can use it for other MATLAB functionality such as system modelling that requires the accuracy of double precision.
Hope this helps.
