MATLAB: Hard restart problems with 2018b, 2020a, Ubuntu 18.04 and 20.04

crashlinux

I've been having a very frustrating issue when running computationally intensive tasks on my workstation. Occasionally, my system undergoes a complete hard restart with no warning, error message, or log. I've been testing my system extensively, pulling and swapping memory, trying "stress test" programs, and nothing can reproduce the issue experienced by Matlab. Ubuntu records no system crash issues, nothing appears wrong with my cooling or power supply. I've upgraded and downgraded both Ubuntu and Matlab, trying combinations of 18.04 and 20.04 with R2020a and R2018b.
The issue will arise with something as simple (but not easy) as:
A = rand(20000,10000);
x = rand(10000,1);
y = A*x;
x_ls = A\y;
I can run the above just fine on my home computer, which is comparatively much "weaker". If this was a memory issue, Matlab usually tells me that I'm out of memory and doesn't crash the whole show.
It is repeatable in the sense that if the problem happens, I can start matlab again, call the same command and it crashes again. However, it is not consistient in the sense that I can run some other intense programs that take hours to finish but do not cause the issue.
Any thoughts? Is this some occult hardware issue that somehow only Matlab is stressing?
Hardware details in case it helps:
EVGA 850W PSU
Asus Tuf X299 Mark ii MB
i7 7820X
2X Corsair Vengance LPX 16GB
Nvidia P4000
Samsung SSD

Best Answer

I was able to solve this problem by updating the BIOS on my motherboard.
What led me to do this was a long rabbit hole of troubleshooting (Mathworks tech support was very helpful), where I finally decided it was a power management issue - heavy computation needs more power, and if the system can't provide it, apparently this can cause hard restarts. So, I thought I had narrowed it down to either a faulty power supply or motherboard and was ready to pull the PSU, but in going through my BIOS settings to look for things relating to power, I decided to check for updates, and this ended up solving the problem without replacing any hardware. Clearly there was a bug in whatever version I had that was fixed in the update!
Related Question