Warning using zonal_stats: RuntimeWarning: overflow encountered in reduce return

pythonzonal statistics

I have a grid of cells (100km x 100km) around the African continent (shapefile) . I am doing a zonal_stats with the population of each cell. It is a raster data (geotiff) with resolution 1kmx1km.

This is the code I am using:

import geopandas as gpd
from pyproj import CRS
import rasterio
from rasterstats import zonal_stats
from rasterio.mask import mask

pop_array, pop_transform = mask(pop, shapes=grid.geometry, crop=True)

zs1 = zonal_stats(grid, pop_array[0], affine=pop_transform, stats=['mean'], nodata=np.nan)
grid['pop'] = [x['mean'] for x in zs1]

where pop is my population raster file and the grid is my shapefile of cells. Both of them have the same crs. However, the following warning appears:

RuntimeWarning: overflow encountered in reduce return umr_sum(a, axis, dtype, out, keepdims, initial, where)

Some of the values of the resulting column "pop" has a value "-inf", and others have a very high negative value, which is surprising for me given that the raster is population count. I have looked at a possible problem and it could be that I need to specify a greater dtype. For example, with astype.

Given that, I have tried to add the following in the last line of my code:

grid['pop'] = [x['mean'] for x in zs1].astype(float128)

But it's still not working with the following error: "AttributeError: 'list' object has no attribute 'astype'"

Any idea about how can I solve it?

"Pop" is the name I assigned to the new column with the results from the zonal stats. It has dtype "float64". When I do pop_array.dtype(), it says to me the following:

TypeError: 'numpy.dtype[float32]' object is not callable

Best Answer

The source nodata value (nan in this case) needs to be defined or supplied as an arg to rasterio.mask.mask(), so in addition to masking out the geometry it also masks out the nodata value and doesn't attempt to include nodata in the calculations.

However I wouldn't expect it to ever overflow, so that's strange. If nans were being included in the calculation the result should be nan as well.

It's possible that somewhere along the line your data was being cast to an integer type. nan is essentially a special-case floating point value which it doesn't apply to integers, so naively treating nan as an int will give you wacky values. It seems like np.mean() is trying to sum all those wacky values and overflowing.

I'm not sure if this is a rasterstats bug or something funky with your specific dataset, like the raster dtype being incorrect or something. To properly debug that I'd probably need your actual input data and the exact versions of all the libraries you're using.

Related Solutions

Computing vegetation indices using scikit-image: “Divide by zero” warning prevents Red Green Index calculation

This link contains the solution.

The problem occurs in Numpy and not in scikit-learn.

You can disable the warning with numpy.seterr. Put this before the possible division by zero:

np.seterr(divide='ignore')

That'll disable zero division warnings globally. If you just want to disable them for a little bit, you can use numpy.errstate in a with clause:

with np.errstate(divide='ignore'):
    # some code here

For a zero by zero division (undetermined, results in a NaN), the error behaviour has changed with numpy version 1.12.0: this is now considered "invalid", while previously it was "divide".

Thus, if there is a chance you your numerator could be zero as well, use

np.seterr(divide='ignore', invalid='ignore')

with np.errstate(divide='ignore', invalid='ignore'):
    # some code here

Reduce the size of the file while I write an array to GeoTIFF using Python

You want to use a sensible data type (UInt8 should work for your data) and compress the output. You can do this with the Create statement, and use LZW (or DEFLATE) compression:

file2 = driver.Create(outFile, xsize=file.RasterXSize, 
                     ysize=file.RasterYSize, bands=1,
                     eType=gdal.GDT_Byte,
                     options=['COMPRESS=LZW']
                     )

See more details about compression options here.

Best Answer

Related Solutions

Computing vegetation indices using scikit-image: “Divide by zero” warning prevents Red Green Index calculation

Reduce the size of the file while I write an array to GeoTIFF using Python

Related Question