Python – How Rasterstats Zonal_Statistics Returns Values Not Present in Array

pythonrasterstatszonal statistics

I'm truly stumped on why I'm getting incorrect results with zonal_statistics. I'm passing a geometry and the raster in the normal fashion that's worked for me before, with 'data' as the raster and 'gdf' a geopandas dataframe:

zonal_stats(gdf.geometry, data.read(1), affine=data.transform, stats="count", categorical=True, nodata=data.nodata)

My array has values from 0 to 584 (np.max(data.read(1)) returns 584). But regardless of what geometry I pass, it always returns the number 64537, which is obviously nowhere present in my array.

Where is it getting this number from?

Here I will provide steps to reproduce:
The files (example_shapefiles.geojson and raster_file.tiff) can be downloaded from here. Here is the snippet to load in the files:

import geopandas as gdf
import rasterio
from rasterstats import zonal_stats

gdf=gpd.read_file('example_shapefiles.geojson')
data=rasterio.open('raster_file.tiff')

The shapefiles initially have EPSG:

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

I convert to the same EPSG as the raster with:

gdf=gdf.to_crs(data.crs)

Now they both have EPSG:

<Projected CRS: EPSG:5070>
Name: NAD83 / Conus Albers
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: unnamed
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

Now running zonal_statistics with

zonal_stats(gdf.geometry, data.read(1), affine=data.transform, stats="count", categorical=True, nodata=data.nodata)

The results is

[{64537: 24603, 'count': 24603},
 {64537: 9846, 'count': 9846},
 {64537: 4540, 'count': 4540},
 {64537: 3747, 'count': 3747},
 {64537: 1919, 'count': 1919},
 {64537: 1497, 'count': 1497},
 {64537: 1304, 'count': 1304},
 {64537: 1255, 'count': 1255},
 {64537: 1112, 'count': 1112},
 {64537: 1062, 'count': 1062},
 {64537: 1053, 'count': 1053}]

I don't know what else I might be doing wrong. I try to figure things out on my own I'm pretty new to GIS and I just can't tell what I'm doing wrong.

Best Answer

Your polygons don't intersect your raster:

import geopandas as gpd
import rasterio
from shapely.geometry import box

gdf = gpd.read_file('example_shapefiles.geojson')
data = rasterio.open('raster_file.tiff')

raster_bbox = box(*data.bounds)

for feat in gdf.geometry:
    print(feat.intersects(raster_bbox))
False
False
False
False
False
False
False
False
False
False
False

The datasets have to overlap and be provided in the same projection

Related Question