OGR – Filter Vector Layer by List of Feature IDs

gdalogrpython

I often use GDAL/OGR via their python bindings.

I know of the very useful ogr.Layer.SetAttributeFilter() command. However I am hoping to filter by feature ID's, rather than attributes. Also, I don't want to filter by just one value, but rather by a set of values. Ideally I could pass a list of feature ID's to the filter — is there some method that can allow me to do this (either through SetAttributeFilter or otherwise)?

It occurred to me that I could copy the feature ID's to their own attribute field, but it would be nice to avoid having to do this.

CLARIFICATION: I am hoping to apply filters for read-only processes which would involve sorting, selecting & exporting/rasterizing the layer geometries. I primarily use ESRI Shapefile format.

Best Answer

For an ESRI Shapefile:

This can be accomplished using the ogr.layer.SetAttributeFilter() method. Feature ID's in a shapefile are stored in the FID field. This field cannot be accessed via OGR's GetField method (it is not a 'proper' attribute field in that it is not stored in the .dbf file), however it is apparently accessible through SQL queries. I was able to do this as follows:

## Open shapefile geometry
driver = ogr.GetDriverByName('ESRI Shapefile')
datasource = driver.Open(shapefile_path, 0)
layer = datasource.GetLayer()

## Store ID's of desired features in a list
fid_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

## Pass ID's to a SQL query as a tuple, i.e. "(1, 2, 3, ...)"
layer.SetAttributeFilter("FID IN {}".format(tuple(fid_list)))

## The filter is now active & we can work with the layer as we like!

## For example, we can rasterize the layer & only the features
## that we specified in 'fid_list' will be included in the output

## Initialize output raster
dataset_out = gdal.GetDriverByName('GTiff').Create(
                   output_path, cols, rows, 1, gdal.GDT_Byte)
dataset_out.SetGeoTransform(gt)
dataset_out.SetProjection(srs)

## Rasterize filtered layer
gdal.RasterizeLayer(output_path, [1], layer,
    options=["ATTRIBUTE={}".format(field_name), "ALL_TOUCHED=TRUE"])

NOTE: the FID field of a shapefile can change under some circumstances. For example, if a feature is deleted, the feature ID's will be automatically renumbered so that the numbering is sequential and without gaps. So, this approach would probably not be advisable unless the file is on your local network & you do not intend to do any editing while the filter is in place (ideally with the shapefile opened in read-only mode).

Of course, you could just create a new static attribute field which could hold the FID values or some other unique identifier. The approach detailed above would still work in this case.

Related Solutions

[GIS] How to sort features in a layer using OGR/Python

I followed the second approach now and this seems to work, comments welcome:

def sortlayer(l, fd):
    # fids are unique, fids may be sorted or unsorted, fids may be consecutive or have gaps
    # don't care about semantics, don't touch fids and their order, reuse fids
    fids = []
    vals = []
    l.ResetReading()
    for f in l:
        fid = f.GetFID()
        fids.append(fid)
        vals.append((f.GetField(fd), fid))
    vals.sort()
    # index as dict: {newfid: oldfid, ...}
    ix = {fids[i]: vals[i][1] for i in xrange(len(fids))}

    # swap features around in groups/rings
    for fidstart in ix.keys():
        if fidstart not in ix: continue
        ftmp = l.GetFeature(fidstart)
        fiddst = fidstart
        while True:
            fidsrc = ix.pop(fiddst)
            if fidsrc == fidstart: break
            f = l.GetFeature(fidsrc)
            f.SetFID(fiddst)
            l.SetFeature(f)
            fiddst = fidsrc
        ftmp.SetFID(fiddst)
        l.SetFeature(ftmp)

Python GDAL/OGR – Handling NoneType Feature in Python GDAL/OGR

It is a pure Python problem and not an GDAL/OGR problem. But if you want to program in Python,you need to understand the difference between an iterator and an iterable (many, many references on the Web, but one of the best is Loop Like A Native by Ned Batchelder). These are fundamental concepts in Python.

"The important operation on an iterable is iter(), which will return an iterator. And the only operation available on an iterator is next(), which either returns the next value, or raises StopIteration, a special exception that means the iteration is finished." (from Ned Batchelder)

An iterable produces an iterator and an iterator produces a stream of values.

So what's the difference between the for loop and pointsLayer.GetNextFeature() ?

the for loops = iterator and pointsLayer= iterable

for feature in iterable:
   statements

so with your example:

from osgeo import ogr
source = ogr.Open('yourpointshapefile.shp')
pointsLayer = source.GetLayer()
for feature in pointsLayer:
    geom =feature.GetGeometryRef()
    xy = geom.GetPoint()
    print xy

(272022.68669955182, 155404.12013724342, 0.0)
(272904.99338241993, 152881.6706538822, 0.0)
.....
(272796.14718989708, 152075.00336062044, 0.0)

But we can create another type of iterator with iter() and next():

source = ogr.Open('yourpointshapefile.shp')
pointsLayer = source.GetLayer()
iterator = iter(pointsLayer)
# first feature in pointsLayer
feature = iterator.next()
geom = feature.GetGeometryRef()
xy = geom.GetPoint()
print xy
(272022.68669955182, 155404.12013724342, 0.0)
# second feature in pointsLayer
feature = iterator.next()
geom = feature.GetGeometryRef()
xy = geom.GetPoint()
print xy
(272904.99338241993, 152881.6706538822, 0.0)
....
# end raises StopIteration error to signal that iteration is complete
feature = iterator.next()
Traceback (most recent call last):
...
StopIteration

Unlike the for loop that handles the StopIteration error, you need here another control structure (while loop,if,...)

Thus, what is pointsLayer.GetNextFeature() ?

It is an iterator and you can replace iterator = iter(pointsLayer) and iterator.next() by

feature = pointsLayers.GetNextFeature()
geom = feature.GetGeometryRef()
xy = geom.GetPoint()
.....
feature = pointsLayers.GetNextFeature()
....

I hope that I have been able to explain why you should not mix the for loops (iterator) with .GetNextFeature() (another iterator).

Best Answer

Related Solutions

[GIS] How to sort features in a layer using OGR/Python

Python GDAL/OGR – Handling NoneType Feature in Python GDAL/OGR

Related Question