I have two large datasets, RP1
and RP2
. I would like to subtract one from the other, however the datasets are extremely large and so exit out with a MemoryError
when I try a direct read:
RP1 = rasterio.open(big_file_path)
RP1 = RP1.read(1, masked=True)
RP1 = RP1_dataset.read(1, masked=True)
File "rasterio_io.pyx", line 348, in
rasterio._io.DatasetReaderBase.read MemoryError
Therefore I have tried to resort to windowed reading and writing as explained by the rasterio
docs. I wrote the following script:
import numpy
import rasterio
image = numpy.ones((150, 250), dtype=rasterio.ubyte) * 127
RP1_filename = r'RP1.tif'
RP2_filename = r'RP2.tif'
RP2 = rasterio.open(RP2_filename)
profile = RP2.profile
with rasterio.open(RP1_filename) as RP1:
for block_index, window in RP1.block_windows(1):
RP1_block = RP1.read(window=window,masked = True)
RP2_block = RP2.read(window=window, masked = True)
result_block = RP2_block - RP1_block
print(result_block.shape)
RP2.close()
Since RP1
and RP2
have the same window schema I took the liberty of indexing into RP2
by the current RP1
block index. This seems to work fine and the result_block
variable contains a masked array as expected.
The problem is I don't know how to collect all these arrays properly such that I can write a new output raster with the result values.
Does anyone know how to do that?
Best Answer
You can directly write the result blocks to the big result tiff file via windowed writing.
If indeed you want to assemble one big results array in memory you would have to initialize an empty array with the dimensions of RP1 and iteratively assign the block arrays to the respective rows/columns (you can get these from the window).