We have a 580 GTX with 3Gb of ram running in a linux (Ubuntu Lucid with Natty backported kernel) machine with 2011b and I find myself fighting with seemingly random crashes due to memory allocation in the GPU. The first thing I noticed was that overwriting a variable defined on the GPU does not always give me back all the ram that the old variable had minus the size of the new data, so I have to clear the variable instead of overwriting it; is there some collection of best practices to avoid wasting memory in ways similar to this?
I also find that a calculation that has been running for hours, and that has successfully complete before will sometimes crash with an "unexpected error" which seems to correlate with running close to maximum memory capacity. Since the program had completed before, I am left assuming that some other program interfered with memory allocation in the GPU and killed my task. Is there a way to prevent this from happening? Maybe running the server headless, or putting in another, smaller video card to run the display?
Thanks
Best Answer