It wasn't clear to me what you wanted initially. You basically need to create a raster mask from your polygon with the same resolution as your tiff.
You should check out rasterstats. If you're willing to install the package, it's as simple as feeding the raster and the polygon with "raster_out" flag set to True. Then you get exactly your pixels within the polygon.
If you are averse to installing the package for some reason, the relevant portions are
with Raster(raster, affine, nodata, band_num) as rast:
features_iter = read_features(vectors, layer)
for i, feat in enumerate(features_iter):
geom = shape(feat['geometry'])
if 'Point' in geom.type:
geom = boxify_points(geom, rast)
geom_bounds = tuple(geom.bounds)
fsrc = rast.read(bounds=geom_bounds)
# create ndarray of rasterized geometry
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
"geom" is a shapely geometry object. "fsrc" is the initial raster you created (extent_data). The function "rasterize_geom" is as follows:
from rasterio import features
def rasterize_geom(geom, like, all_touched=False):
geoms = [(geom, 1)]
rv_array = features.rasterize(
geoms,
out_shape=like.shape,
transform=like.affine,
fill=0,
all_touched=all_touched)
return rv_array
If you're trying to compute statistics, you can use rasterstats, which is built on rasterio. Your question didn't quite specify which pixels you're looking for, but rasterstats will load only the portions of the raster you need and create a mask for pixels outside that polygon. I haven't tried it, but I see from digging around the code that you can supply a shapefile of points as well as polygons.
If you're in a conda environment, install with
conda install -c conda-forge rasterstats
To write to GeoJSON:
dataframe.to_file("output.json", driver="GeoJSON")
To write to GeoPackage:
dataframe.to_file("output.gpkg", driver="GPKG")
Documentation is here, though somewhat sparse.
Best Answer
Install and use the optional pyogrio I/O engine to read the data... that will be a lot faster. Adding the
use_arrow=True
parameter as well will give another big performance improvement, but then you'll also have to install thepyarrow
library:You can also change the defaults globally
geopandas.options.io_engine = "pyogrio"
os.environ["PYOGRIO_USE_ARROW"] = 1
Some timings using a 360 MB .gpkg with polygon data (on windows):