[GIS] GDAL and Python: How to get coordinates for all cells having a specific value

gdalpythonraster

I've got an Arc/Info Binary Grid—specifically, an ArcGIS flow accumulation raster—and I'd like to identify all cells having a specific value (or in a range of values). Ultimately, I'd like a shapefile of points representing these cells.

I can use QGIS to open the hdr.adf and get this result, the workflow is:

QGIS > Raster menu > Raster Calculator (mark all points with target value)
QGIS > Raster menu > Polygonize
QGIS > Vector menu > Geometry submenu > Polygon centroids
Edit the centroids to delete the unwanted poly centroids (those = 0)

This approach "does the job", but it doesn't appeal to me because it creates 2 files I have to delete, then I have to remove unwanted record(s) from the shapefile of centroids (i.e. those = 0).

An existing question approaches this subject, but it's tailored for ArcGIS/ArcPy, and I'd like to stay in the FOSS space.

Does anyone have an existing GDAL/Python recipe/script that interrogates a raster's cell values, and when a target value—or a value in a target range—is found, a record is added to a shapefile? This would not only avoid the UI interaction, but it would create a clean result in a single pass.

I took a shot at it by working against one of Chris Garrard's presentations, but raster work isn't in my wheelhouse and I don't want to clutter the question with my weak code.

Should anyone want the exact dataset to play with, I put it here as a .zip.

[Edit Notes] Leaving this behind for posterity. See comment exchanges with om_henners. Basically the x/y (row/column) values were flipped. The original answer had this line:

(y_index, x_index) = np.nonzero(a == 1000)

inverted, like this:

(x_index, y_index) = np.nonzero(a == 1000)

When I first encountered the issue illustrated in the screenshot, I wondered if I implemented the geometry incorrectly, and I experimented by flipping the x/y coordinate values in this line:

point.SetPoint(0, x, y)

..as..

point.SetPoint(0, y, x)

However that didn't work. And I didn't think to try flipping the values in om_henners' Numpy expression, believing wrongly that flipping them at either line was equivalent. I think the real issue relates to the x_size and y_size values, respectively 30 and -30, which are applied when the row and column indices are used to calculate point coordinates for the cells.

[Original Edit]

@om_henners, I'm trying your solution, in concert with a couple recipies for making point shapefiles using ogr (invisibleroads.com, Chris Garrard), but I'm having an issue where the points are appearing as if reflected across a line passing through 315/135-degrees.

Light blue points: created by my QGIS approach, above

Purple points: created by the GDAL/OGR py code, below

enter image description here

[Solved]

This Python code implements the complete solution as proposed by @om_henners. I've tested it and it works. Thanks man!

from osgeo import gdal
import numpy as np
import osgeo.ogr
import osgeo.osr

path = "D:/GIS/greeneCty/Greene_DEM/GreeneDEM30m/flowacc_gree/hdr.adf"
print "\nOpening: " + path + "\n"

r = gdal.Open(path)
band = r.GetRasterBand(1)

(upper_left_x, x_size, x_rotation, upper_left_y, y_rotation, y_size) = r.GetGeoTransform()

a = band.ReadAsArray().astype(np.float)

# This evaluation makes x/y arrays for all cell values in a range.
# I knew how many points I should get for ==1000 and wanted to test it.
(y_index, x_index) = np.nonzero((a > 999) & (a < 1001))

# This evaluation makes x/y arrays for all cells having the fixed value, 1000.
#(y_index, x_index) = np.nonzero(a == 1000)

# DEBUG: take a look at the arrays..
#print repr((y_index, x_index))

# Init the shapefile stuff..
srs = osgeo.osr.SpatialReference()
#srs.ImportFromProj4('+proj=utm +zone=15 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs')
srs.ImportFromWkt(r.GetProjection())

driver = osgeo.ogr.GetDriverByName('ESRI Shapefile')
shapeData = driver.CreateDataSource('D:/GIS/01_tutorials/flow_acc/ogr_pts.shp')

layer = shapeData.CreateLayer('ogr_pts', srs, osgeo.ogr.wkbPoint)
layerDefinition = layer.GetLayerDefn()

# Iterate over the Numpy points..
i = 0
for x_coord in x_index:
    x = x_index[i] * x_size + upper_left_x + (x_size / 2) #add half the cell size
    y = y_index[i] * y_size + upper_left_y + (y_size / 2) #to centre the point

    # DEBUG: take a look at the coords..
    #print "Coords: " + str(x) + ", " + str(y)

    point = osgeo.ogr.Geometry(osgeo.ogr.wkbPoint)
    point.SetPoint(0, x, y)

    feature = osgeo.ogr.Feature(layerDefinition)
    feature.SetGeometry(point)
    feature.SetFID(i)

    layer.CreateFeature(feature)

    i += 1

shapeData.Destroy()

print "done! " + str(i) + " points found!"

Best Answer

In GDAL you can import the raster as a numpy array.

from osgeo import gdal
import numpy as np

r = gdal.Open("path/to/raster")
band = r.GetRasterBand(1) #bands start at one
a = band.ReadAsArray().astype(np.float)

Then using numpy it is a simple matter to get the indexes of an array matching a boolan expression:

(y_index, x_index) = np.nonzero(a > threshold)
#To demonstate this compare a.shape to band.XSize and band.YSize

From the raster geotransform we can get information such as the upper left x and y coordinates and the cell sizes.

(upper_left_x, x_size, x_rotation, upper_left_y, y_rotation, y_size) = r.GetGeoTransform()

The upper left cell corresponds to a[0, 0]. The Y size will always be negative, so using the x and y indices you can calculate the coordinates of each cell based on the indexes.

x_coords = x_index * x_size + upper_left_x + (x_size / 2) #add half the cell size
y_coords = y_index * y_size + upper_left_y + (y_size / 2) #to centre the point

From here it's a simple enough matter to create a shapefile using OGR. For some sample code see this question for how to generate a new dataset with point information.

Related Solutions

[GIS] Image processing using Python, GDAL and Scikit-Image

Firstly, welcome to the site!

Numpy arrays don't have a concept of coordinate systems inbuilt into the array. For a 2D raster they are indexed by column and row.

Note I'm making the assumption that you're reading a raster format that is supported by GDAL.

In Python the best way to import spatial raster data is with the rasterio package. The raw data imported by rasterio is still a numpy array without access to coordinate systems, but rasterio also gives you access to an affine method on the source array which you can use to transform raster columns and rows to projected coordinates. For example:

import rasterio

# The best way to open a raster with rasterio is through the context manager
# so that it closes automatically

with rasterio.open(path_to_raster) as source:

    data = source.read(1) # Read raster band 1 as a numpy array
    affine = source.affine

# ... do some work with scikit-image and get an array of local maxima locations
# e.g.
# maxima = numpy.array([[0, 0], [1, 1], [2, 2]])
# Also note that convention in a numy array for a 2d array is rows (y), columns (x)

for point in maxima: #Loop over each pair of coordinates
    column = point[1]
    row = point[0]
    x, y = affine * (column, row)
    print x, y

# Or you can do it all at once:

columns = maxima[:, 1]
rows = maxima[:, 0]

xs, ys = affine * (columns, rows)

And from there you can write your results our to a text file however you like (I'd suggest taking a look at the inbuilt csv module for example).

[GIS] Rasterizing Shapefile with attribute value as pixel value with GDAL in Python

Because your Area field contains floating point values, you need to account for that when creating the raster. As it stands, you are creating a raster that can only hold a single value:

target_ds = gdal.GetDriverByName('GTiff').Create(raster_fn, x_res, y_res, len(set(field_vals)), gdal.GDT_Byte)

Instead, change gdal.GDT_Byte to gdal.GDT_Float32.

Best Answer

Related Solutions

[GIS] Image processing using Python, GDAL and Scikit-Image

[GIS] Rasterizing Shapefile with attribute value as pixel value with GDAL in Python

Related Question