Python – Splitting GeoTIFF into Multiple Cells with Rasterio

geotiff-tiffmachine learningpythonrasterrasterio

I'd like to create training data for a machine learning pipeline. To do this I want to take a large raster and split it into multiple equal sized cells.

I have a very basic loop for generating the pixel-wise coordinates, how do I generate a cell from these when splitting up a GeoTIFF?

def splitImageIntoCells(img, filename, squareDim):
 numberOfCellsWide = img.shape[1] // squareDim
 numberOfCellsHigh = img.shape[0] // squareDim
 x, y = 0, 0
 for hc in range(numberOfCellsHigh):
     y = hc * squareDim
     for wc in range(numberOfCellsWide):
         x = wc * squareDim
         # Need some method from Rasterio here 
         # to crop at the given x and y...

Best Answer

I devised the following 4 methods:

from shapely import geometry
from rasterio.mask import mask

# Takes a Rasterio dataset and splits it into squares of dimensions squareDim * squareDim
def splitImageIntoCells(img, filename, squareDim):
    numberOfCellsWide = img.shape[1] // squareDim
    numberOfCellsHigh = img.shape[0] // squareDim
    x, y = 0, 0
    count = 0
    for hc in range(numberOfCellsHigh):
        y = hc * squareDim
        for wc in range(numberOfCellsWide):
            x = wc * squareDim
            geom = getTileGeom(img.transform, x, y, squareDim)
            getCellFromGeom(img, geom, filename, count)
            count = count + 1

# Generate a bounding box from the pixel-wise coordinates using the original datasets transform property
def getTileGeom(transform, x, y, squareDim):
    corner1 = (x, y) * transform
    corner2 = (x + squareDim, y + squareDim) * transform
    return geometry.box(corner1[0], corner1[1],
                        corner2[0], corner2[1])

# Crop the dataset using the generated box and write it out as a GeoTIFF
def getCellFromGeom(img, geom, filename, count):
    crop, cropTransform = mask(img, [geom], crop=True)
    writeImageAsGeoTIFF(crop,
                        cropTransform,
                        img.meta,
                        img.crs,
                        filename+"_"+str(count))

# Write the passed in dataset as a GeoTIFF
def writeImageAsGeoTIFF(img, transform, metadata, crs, filename):
    metadata.update({"driver":"GTiff",
                     "height":img.shape[1],
                     "width":img.shape[2],
                     "transform": transform,
                     "crs": crs})
    with rasterio.open(filename+".tif", "w", **metadata) as dest:
        dest.write(img)

To use these you would simply read in an image with Rasterio, and call the first method like so:

splitImageIntoCells(myReadInImage, "my_file_name", 1000)