[GIS] Crop a raster using rasterio and geopandas

geojsongeopandasosgeorasterio

I am cropping a set of historic aerial photographs. These photographs have large black areas in the edges (0 value). However there is also valid data with a 0 value. The workflow I am using is:

  1. Load raster with rasterio
  2. Polygonize the raster with rasterio.features.shapes()
  3. Identify polygons where value = 0 and size > 5000 square meters
  4. Mask the original images with polygons, perform an inverted mask

Here is my current code for masking a single image:

import rasterio
from rasterio import features
from rasterio import mask
import json
import geopandas as gpd

results = []
final_results = []

with rasterio.open(r"C:\1927_oahu\tif\_Line1_6to8_0.tif") as src:
    src_meta = src.meta
    src_affine = src_meta.get("transform")

    band = src.read(1)

    for geometry, raster_value in features.shapes(band, transform=src_affine):
        if raster_value == 0:
        result = {'properties': {'raster_value': raster_value}, 'geometry': geometry}
        results.append(result)

gpd_results = gpd.GeoDataFrame.from_features(results)

gpd_results["area"] = gpd_results["geometry"].area

gpd_results_filtered = gpd_results[gpd_results["area"] > 5000]

gpd_filtered_json_str = gpd_results_filtered.to_json()

gpd_filtered_json_dict = json.loads(gpd_filtered_json_str)

for k, v in gpd_filtered_json_dict.iteritems():
    if k == "features":
        for items in v:
            #final_results = {"coordinates": (items.get("geometry").get("coordinates"))}
            final_results = {"geometry": (items.get("geometry").get("coordinates"))}

masked_band, masked_transform = mask.mask(src, final_results, invert=True)

src_meta.update(dtype=rasterio.uint8, nodata=0)
with rasterio.open(os.path.join(r"C:\1927_oahu_output", "out.tif"), 'w', **src_meta) as dst:
    dst.write_band(1, masked_band.astype(rasterio.uint8))

When I run this code I receive the following error: AttributeError: 'str' object has no attribute 'get'

The documentation for rasterio.mask states: Polygons are GeoJSON-like dicts specifying the boundaries of features in the raster to be kept. All data outside of specified polygons will be set to nodata.

I am assuming that I am giving rasterio.mask the wrong type of "GeoJSON-like dict". I have tried to reformat the dict several ways without success. Does anyone know the correct way to convert GeoJSON to a "GeoJSON-like dict"?

Or can anyone provide the correct format of a "GeoJSON-like dict"?

I am new to rasterio and geopandas.

Best Answer

The issue is resolved. The issue was I misread the documentation. On a second read, the rasterio.mask documentation clearly states that polygons should be a list of GeoJSON-like dicts. I found the following snippet of code to generate those lists from this answer:

geoms = [feature["geometry"] for feature in shapefile]

Here is the the full code that is working:

import rasterio
from rasterio import features
from rasterio import mask
import json
import geopandas as gpd
import os

results = []
final_results = []

with rasterio.open(r"C:\1927_oahu\tif\_Line1_6to8_0.tif") as src:
    src_meta = src.meta.copy()
    src_affine = src_meta.get("transform")

    band = src.read(1)

    for geometry, raster_value in features.shapes(band, transform=src_affine):
        if raster_value == 1:
            result = {'properties': {'raster_value': raster_value}, 'geometry': geometry}
            results.append(result)

        gpd_results = gpd.GeoDataFrame.from_features(results)

        gpd_results["area"] = gpd_results["geometry"].area

        gpd_results_filtered = gpd_results[gpd_results["area"] > 5000]

        gpd_filtered_json_str = gpd_results_filtered.to_json()

        gpd_filtered_json_dict = json.loads(gpd_filtered_json_str)

        final_results = [feature["geometry"] for feature in gpd_filtered_json_dict["features"]]

        masked_band, masked_transform = mask.mask(src, final_results, invert=True)

        masked_band[masked_band > 255] = 0

        src_meta.update(dtype=rasterio.uint8, height=int(masked_band.shape[1]), width=int(masked_band.shape[2]), nodata=0, transform=masked_transform, compress='lzw')

        with rasterio.open(r"C:\1927_oahu\output\_Line1_6to8_0.tif"), 'w', **src_meta) as dst:
                dst.write(masked_band.astype(rasterio.uint8))   
Related Question