Python – Splitting GeoTIFF into Multiple Cells with Rasterio

geotiff-tiffmachine learningpythonrasterrasterio

I'd like to create training data for a machine learning pipeline. To do this I want to take a large raster and split it into multiple equal sized cells.

I have a very basic loop for generating the pixel-wise coordinates, how do I generate a cell from these when splitting up a GeoTIFF?

def splitImageIntoCells(img, filename, squareDim):
 numberOfCellsWide = img.shape[1] // squareDim
 numberOfCellsHigh = img.shape[0] // squareDim
 x, y = 0, 0
 for hc in range(numberOfCellsHigh):
     y = hc * squareDim
     for wc in range(numberOfCellsWide):
         x = wc * squareDim
         # Need some method from Rasterio here 
         # to crop at the given x and y...

Best Answer

I devised the following 4 methods:

from shapely import geometry
from rasterio.mask import mask

# Takes a Rasterio dataset and splits it into squares of dimensions squareDim * squareDim
def splitImageIntoCells(img, filename, squareDim):
    numberOfCellsWide = img.shape[1] // squareDim
    numberOfCellsHigh = img.shape[0] // squareDim
    x, y = 0, 0
    count = 0
    for hc in range(numberOfCellsHigh):
        y = hc * squareDim
        for wc in range(numberOfCellsWide):
            x = wc * squareDim
            geom = getTileGeom(img.transform, x, y, squareDim)
            getCellFromGeom(img, geom, filename, count)
            count = count + 1

# Generate a bounding box from the pixel-wise coordinates using the original datasets transform property
def getTileGeom(transform, x, y, squareDim):
    corner1 = (x, y) * transform
    corner2 = (x + squareDim, y + squareDim) * transform
    return geometry.box(corner1[0], corner1[1],
                        corner2[0], corner2[1])

# Crop the dataset using the generated box and write it out as a GeoTIFF
def getCellFromGeom(img, geom, filename, count):
    crop, cropTransform = mask(img, [geom], crop=True)
    writeImageAsGeoTIFF(crop,
                        cropTransform,
                        img.meta,
                        img.crs,
                        filename+"_"+str(count))

# Write the passed in dataset as a GeoTIFF
def writeImageAsGeoTIFF(img, transform, metadata, crs, filename):
    metadata.update({"driver":"GTiff",
                     "height":img.shape[1],
                     "width":img.shape[2],
                     "transform": transform,
                     "crs": crs})
    with rasterio.open(filename+".tif", "w", **metadata) as dest:
        dest.write(img)

To use these you would simply read in an image with Rasterio, and call the first method like so:

splitImageIntoCells(myReadInImage, "my_file_name", 1000)

Related Solutions

Rasterio for Geotiff – How to Update Metadata

The previous answers are misleading or wrong. To modify the nodata value of a GeoTIFF with Rasterio, do this

with rasterio.open(tiffname, 'r+') as dataset:
    dataset.nodata = -32767

The project has tests of this usage that you may see also: https://github.com/mapbox/rasterio/blob/master/tests/test_update.py#L59-L64. Note that you may have to close and reopen the file (as done in the test) to see the nodata value take effect in your program.

The meta property of a dataset is a copy of some of its important metadata. Modifying that object has no effect on the dataset.

Rasterio – How to Split Multiband Image into Image Tiles Using Rasterio

Below is a simple example (rasterio 1.0.0 or later, won't work in 0.3.6). There might be better/simpler ways (and there is an easier way if your raster is internally tiled and the tile block sizes match your desired output tile size).

The rasterio docs have some examples of concurrent processing if you want to go down that road.

import os
from itertools import product
import rasterio as rio
from rasterio import windows

in_path = '/path/to/indata/'
input_filename = 'dtm_5.tif'

out_path = '/path/to/output_folder/'
output_filename = 'tile_{}-{}.tif'

def get_tiles(ds, width=256, height=256):
    nols, nrows = ds.meta['width'], ds.meta['height']
    offsets = product(range(0, nols, width), range(0, nrows, height))
    big_window = windows.Window(col_off=0, row_off=0, width=nols, height=nrows)
    for col_off, row_off in  offsets:
        window =windows.Window(col_off=col_off, row_off=row_off, width=width, height=height).intersection(big_window)
        transform = windows.transform(window, ds.transform)
        yield window, transform


with rio.open(os.path.join(in_path, input_filename)) as inds:
    tile_width, tile_height = 256, 256

    meta = inds.meta.copy()

    for window, transform in get_tiles(inds):
        print(window)
        meta['transform'] = transform
        meta['width'], meta['height'] = window.width, window.height
        outpath = os.path.join(out_path,output_filename.format(int(window.col_off), int(window.row_off)))
        with rio.open(outpath, 'w', **meta) as outds:
            outds.write(inds.read(window=window))

Best Answer

Related Solutions

Rasterio for Geotiff – How to Update Metadata

Rasterio – How to Split Multiband Image into Image Tiles Using Rasterio

Related Question