MATLAB: Is necesary more GPU for MATLAB compiled apps

compiledcompilergpuMATLAB CompilerMATLAB Runtime ServernvidiaParallel Computing Toolboxphysicalprocessruntimeservervirtualwindowsworkersworkstation

Hello, I have an question because I've tried to run a compiled app from MATLAB (MATLAB Compiler Runtime – I only have the .exe of the compiled app and the MCR) in three different environments:
-Physical Server-
  • HP Proliant BL685C G7
  • 4CPU (8 cores) 32 total cores AMD Opteron 3 GHz
  • 96 GB RAM
  • Windows 2012 Server
With this HW configuration I completed the process of my application in 72 hours approximately, with apparently 4 – 5 workers.
-Workstation-
  • HP Z840 WORKSTATION
  • 2 INTEL XEON E5-2667V3 (8 cores) 16 total cores
  • 3.2 2133 8C 32GB DDR4-2133 (4x8GB)
  • NVIDIA Quadro K620
  • 2GB 1st GFX 128GB SATA 1st SSD
  • WIN8.1 PRO 64 DG TO WIN7 PRO 64
With these HW configuration I completed the process of my aplication in approximately 11 hours, with 12 workers.
-Virtual Server-
  • Microsoft Windows Server 2012 (64-bit)
  • 10 vCPU
  • 32 GB RAM
  • Mounted in a hypervisor.
With this HW configuration I completed the process in approximately 72 hours, with 12 workers, apparently.
I noticed that probably the difference is in the GPU of the Workstation for the NVIDIA, because I read that MATLAB is a based 3D application, and the graphic card is responsible for this behavior.
After beginning the calculation of my compiled app, the CPU´s work up my total cores to the 100% but then, after about 5 minutes, it decreases and only 1 CPU is working at about 10% and memory process is like 70%.
Is there something more that make this happen, or I am correct in this analysis, and I need more GPUs in the other servers? I only tried-out more in the Virtual Server because I don't have the other servers anymore.
Thank you for the attention and sorry for my bad English 😛
P.S. I called workers to some background processes that pops out in the Task Manager, depends of my virtual cores is the number that processes, I don't know how to check this really.

Best Answer

If the functions writes to disk extensively, the SSD might cause the acceleration.
The Opteron has 64 + 64 KB (Data + Instructions) L1-Cache and 1 MB L2 cache, while the Xeon has 2 MB L2 cache, 15 MB L3 cache. If the processed data chunks match into the cache, the processing can be accelerated remarkably.
You see, there could be different reasons for the different processing speed.