[GIS] gdal_array.SaveArray() leaves dataset open in Python

gdalnumpypythonraster

When I use gdal_array.SaveArray() to create a raster, the newly created dataset appears to stay open in Python, preventing other processes from working with it. For instance, consider the following (super minimal) code:

>>> a = np.arange(300).reshape((3, 10, 10))
>>> gdal_array.SaveArray(a, "test.tif")
<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x0458B968> >
>>> os.rename("test.tif", "test2.tif")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
WindowsError: [Error 32] The process cannot access the file because it is being used by another process

I am similarly prevented from moving, renaming, or deleting the file directly from Windows explorer, with the message The action can't be completed because the file is open in python.exe. More importantly, I can't open the file some visualization/processing programs that I use. Once I exit python, the file is "released" and I can manipulate it to my heart's content.

It doesn't seem like the file has a name associated with it, so I can't close it as I would a raster that I had specifically opened:

>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'a', 'gdal', 'gdal_array', 'np', 'os']

What causes this behavior? Is there a way to call SaveArray() such that it doesn't keep the file opened after writing it? Or, a way to close the file from within Python?

In case it's important, my Python bindings are from gdal 1.11.1 for Python 2.7.8 on Windows 7.

Best Answer

Edit based on the comments below:

Assigning the gdal_array.SaveArray(a, "test.tif") call to a variable returns an osgeo.gdal.Dataset object that can be managed as a per the below gotchas. Using the above example this should work:

    a = np.arange(300).reshape((3, 10, 10))
    ds = gdal_array.SaveArray(a, "test.tif")
    ds = None
    os.rename("test.tif", "test2.tif")

Checkout the Python gotchas documentation: https://trac.osgeo.org/gdal/wiki/PythonGotchas

Specifically:

Saving and closing datasets/datasources

To save and close GDAL raster datasets or OGR vector datasources, the object needs to be dereferenced, such as setting it to None, a different value, or deleting the object. If there are more than one copies of the dataset or datasource object, then each copy needs to be dereferenced.

For example, creating and saving a raster dataset:

>>> from osgeo import gdal
>>> driver = gdal.GetDriverByName('GTiff')
>>> dst_ds = driver.Create('new.tif', 10, 15)
>>> band = dst_ds.GetRasterBand(1)
>>> arr = band.ReadAsArray()  # raster values are all zero
>>> arr[2, 4:] = 50  # modify some data
>>> band.WriteArray(arr)  # raster file still unmodified
>>> band = None  # dereference band to avoid gotcha described previously
>>> dst_ds = None  # save, close

The last dereference to the raster dataset writes the data modifications and closes the raster file. WriteArray(arr) does not write the array to disk, unless the GDAL block cache is full (typically 40 MB).

With some drivers, raster datasets can be intermittently saved without closing using FlushCache(). Similarly, vector datasets can be saved using SyncToDisk(). However, neither of these methods guarantee that the data are written to disk, so the preferred method is to deallocate as shown above.