[GIS] Converting coordinate and raster values of each pixel to csv of large raster using Python

big datagdalpythonrraster

There should be a simple way to convert a large raster to a csv or dataframe by reading the raster in as chunks, converting raster chunk to table of just three columns (x, y, and values), and outputting to csv or dataframe. I normally use something like the following in R but it only works for rasters much smaller than I'd like and I'd prefer to use python.

ras = readGDAL("Agroforestry_crop_setnull.tif")
rasdf= as.data.frame(ras, row.names=NULL, optional=FALSE, xy=FALSE, na.rm=TRUE)

I have read about GDALs ReadAsArray and gdal2xyz, I do not know how to use ReadAsArray to get coordinates of pixels in the 3 column table I desire, although it does allow reading the rasters as chunks. gdal2xyz does not provide the raster values of the pixels and does not allow reading as chunks from what I have seen.

In the past I have made use of ArcPy, breaking up the rasters into pieces, converting to points, and adding the coordinates to the attribute fields, then exporting these tables. I am hoping there is a better way, similar to the R solution I posted but for python and big rasters.

Is there a suggestion on reading big rasters as chunks in the csv format I desire -3 columns (X,Y, value)?

Best Answer

You could try the SAGA GIS feature 'Export Grid to XYZ'. Despite the name this also supports multiband images (e.g. you could export a 'rendered GeoTiff' to get XYZRGBA, as well as getting XYZ from a 1-band DEM). You can also choose whether or not to drop NODATA rows.

I don't have any massive rasters to hand to test this, but you could try that to see if the memory usage and performance is more acceptable than gdal2xyz.

Failing that, if you need to write your own code, a python library to look at is RasterIO. This is a wrapper around GDAL/Numpy and provides a higher level interface. It also allows the windowed-reading you refer to.

Related Solutions

[GIS] Extracting values from rasters at location of points using R

Assuming that presencias and variables share the same projection, this should be an easy task. I recommend you to add these lines of code after your read.table() statement in order to convert presencias dataframe to a SpatialPointsDataFrame object (just refine the names of the columns containing x and y coordinates if they differ from my example).

coordinates(presencias) <- c("x", "y")

To provide a reproducible example, I try to open up the scope of my answer a little more. First of all, download and unzip this ESRI shapefile with more or less important locations in Germany. These will serve as point data later on. You will also need packages dismo, rgdal and raster for this short example, so make sure that these libraries (and all their dependencies) are installed on your local hard drive.

Let's start with loading the required packages.

library(dismo)
library(rgdal)
library(raster)

Next, you should generate a sample RasterLayer. In our case, we will make use of the gmap() function from the dismo package in order to obtain a physical map of Germany.

germany.mrc <- gmap("Germany")

You can now import your point shapefile via readOGR from R's rgdal package. Make sure to adjust the data source name (dsn = ...). The whole projection stuff is obsolete in your particular case. However, it has to be done in our example in order to successfully overlay our point data with the Germany RasterLayer.

# Import SpatialPointsDataFrame
germany.places <- readOGR(dsn = "/path/to/shapefile", layer = "places")
# Define shapefile's current CRS
projection(germany.places) <- CRS("+proj=lonlat +ellps=WGS84")
# Reproject to RasterLayer's CRS
germany.places.mrc <- spTransform(germany.places, CRS(projection(germany.mrc)))

To reduce the huge size of our point data, we will draw a random sample of ten locations in Germany. This should suffice for our purposes.

set.seed(35)
germany.places.mrc.sample <- germany.places.mrc[sample(nrow(germany.places.mrc), 10), ]

Now that the preparation stuff is finished, we could just start to extract the values of those particular pixels our ten randomly sampled points lie within.

data <- data.frame(coordinates(germany.places.mrc.sample),
                   germany.places.mrc.sample$name, 
                   extract(germany.mrc, germany.places.mrc.sample))
names(data) <- c("x", "y", "name", "value")

In order to merge the point coordinates with the extracted pixel values, we just need to set up a dataframe containing the coordinates of our SpatialPointsDataFrame. That's it!

data
           x       y          name value
1  1073490.3 6513446 Veitsteinbach   208
2  1269100.8 6156690   Assenhausen   231
3  1336757.5 6246284    Frauenwahl   195
4   828579.9 6634122      Altenhof   189
5  1571418.1 6662558         Wohla   151
6  1192299.4 6864087     Flechtorf   170
7   976270.0 6362050    Hilsenhain   208
8  1117416.4 6092146      Nestbaum   175
9  1274192.0 6344490 Wappeltshofen   236
10  878488.2 6839843        Leeden   208

[GIS] Selecting raster values at random locations in a raster in python/arcgis

If I understand you correctly I think you can solve it like this (their are comments in the code that explain what is going on):

import numpy, arcpy, random

#Establish the extent which your random samples can be within
rangeX = (100, 2500000) # Enter the actual range in x values of your rasters * 100 in order to get coordinates with decimals
rangeY = (100, 2500000) # Enter the actual range in y values of your rasters * 100 in order to get coordinates with decimals
qty = 1000  # Enter in the number greater than random points you need


#Generate random x,y coordinates
randPoints = []
while len(randPoints) < qty:
    x = random.randrange(*rangeX)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    y = random.randrange(*rangeY)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    randPoints.append((x,y))

#Create dictionary of key and lists, list will house tuples of (x,y,z)
#Enter in actual classified values for dictionary keys
valueDict = {'Class1' : [],
             'Class2' : [],
             'Class3' : [],
             'Class4' : []}

######Get Rasters bands as well as cell height, width, origin info to be able to get
######index of x,y location in the numpy array
arcpy.env.workspace = inPath + '\\aster.img'
bands = arcpy.ListRasters()
Ras = arcpy.Raster(inPath + '\\aster.img')
originX = Ras.extent.upperLeft.X
originY = Ras.extent.upperLeft.Y
pixelWidth = Ras.meanCellWidth
pixelHeight = Ras.meanCellHeight

#Create a list that houses each raster array
bandsList = []
for i in bands:
    bandsList.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))

#loop over all of the random point locations and collect raster values at their
#locations if the dictionary entry for that value is not full populate it
#with a tuple of (x,y,z), keep going until each class is full
for i in randPoints:
    X = i[0]
    Y = i[1]
    xOffset = int((X-originX)/pixelWidth)
    yOffset = int(abs(Y-originY)/pixelHeight)
    for j in range(0,len(bands)):
       sampleValue = bandsList[j][yOffset, xOffset]
       for key in valueDict.keys():
           if sampleValue == key:
               if len(valueDict[key]) < 10:
                   valueDict[key].append((X, Y, sampleValue))
                   break
               else:
                   continue

This is a variation of a script that I have used to extract raster values at random x,y locations, so it may need some tweaking but I think the major elements are their to get the job done for you.

Best Answer

Related Solutions

[GIS] Extracting values from rasters at location of points using R

[GIS] Selecting raster values at random locations in a raster in python/arcgis

Related Question