MATLAB: Flaky GPU memory issues

gpuout of memory

We have a 580 GTX with 3Gb of ram running in a linux (Ubuntu Lucid with Natty backported kernel) machine with 2011b and I find myself fighting with seemingly random crashes due to memory allocation in the GPU. The first thing I noticed was that overwriting a variable defined on the GPU does not always give me back all the ram that the old variable had minus the size of the new data, so I have to clear the variable instead of overwriting it; is there some collection of best practices to avoid wasting memory in ways similar to this?
I also find that a calculation that has been running for hours, and that has successfully complete before will sometimes crash with an "unexpected error" which seems to correlate with running close to maximum memory capacity. Since the program had completed before, I am left assuming that some other program interfered with memory allocation in the GPU and killed my task. Is there a way to prevent this from happening? Maybe running the server headless, or putting in another, smaller video card to run the display?
Thanks

Best Answer

In your first observation about overwriting variables on the GPU, I presume you're using the output of "gpuDevice" to check the amount of free memory on the GPU. You're quite right that overwriting an array may not necessarily cause the old memory to be freed immediately; however, it will be freed automatically if necessary to prevent running out of memory.
It's not clear what the 'unexpected error' might be, this is not something I've seen here at The MathWorks on our test machines. Do these errors show up in similar places each time? I.e. does there seem to be a gpuArray operation that particularly causes this?
One final thing to note: like CPU memory, GPU memory can become fragmented over time, and it's possible that this might cause you to run out of GPU memory earlier than you might otherwise anticipate. However, I would not normally expect this to result in 'unexpected errors' - rather, I'd expect to see failed allocations.