[GIS] Converting coordinate and raster values of each pixel to csv of large raster using Python

big datagdalpythonrraster

There should be a simple way to convert a large raster to a csv or dataframe by reading the raster in as chunks, converting raster chunk to table of just three columns (x, y, and values), and outputting to csv or dataframe. I normally use something like the following in R but it only works for rasters much smaller than I'd like and I'd prefer to use python.

ras = readGDAL("Agroforestry_crop_setnull.tif")
rasdf= as.data.frame(ras, row.names=NULL, optional=FALSE, xy=FALSE, na.rm=TRUE)

I have read about GDALs ReadAsArray and gdal2xyz, I do not know how to use ReadAsArray to get coordinates of pixels in the 3 column table I desire, although it does allow reading the rasters as chunks. gdal2xyz does not provide the raster values of the pixels and does not allow reading as chunks from what I have seen.

In the past I have made use of ArcPy, breaking up the rasters into pieces, converting to points, and adding the coordinates to the attribute fields, then exporting these tables. I am hoping there is a better way, similar to the R solution I posted but for python and big rasters.

Is there a suggestion on reading big rasters as chunks in the csv format I desire -3 columns (X,Y, value)?

Best Answer

You could try the SAGA GIS feature 'Export Grid to XYZ'. Despite the name this also supports multiband images (e.g. you could export a 'rendered GeoTiff' to get XYZRGBA, as well as getting XYZ from a 1-band DEM). You can also choose whether or not to drop NODATA rows.

I don't have any massive rasters to hand to test this, but you could try that to see if the memory usage and performance is more acceptable than gdal2xyz.

Failing that, if you need to write your own code, a python library to look at is RasterIO. This is a wrapper around GDAL/Numpy and provides a higher level interface. It also allows the windowed-reading you refer to.

Related Question