I am using this answer to calculate some basic statistics of some points that fall within the bounds of a polygon (a vector grid), such that:
gridfile = 'grid.shp'
pointfile = 'points.shp'
point = gpd.GeoDataFrame.from_file(pointfile)
poly = gpd.GeoDataFrame.from_file(gridfile)
pointInPolys = sjoin(point, poly, how='left')
grouped = pointInPolys.groupby('index_right')['X','Y','Z'].agg(['mean'])
grouped.columns = ["_".join(x) for x in grouped.columns.ravel()]
The input point data has X, Y and Z columns. However, it is only returning statistics (mean) for X and Y and the stats for the Z column are not being returned:
X_mean Y_mean
index_right
1221 -64.781242 32.439396
1902 -64.781206 32.439096
2412 -64.781169 32.438777
The data is definitely available in the prior step by checking:
pointInPolys.keys()
Index(['X', 'Y', 'Z', 'geometry', 'index_right', 'DN'], dtype='object')
Is there a reason why the Z column stats are not being calculated?
Best Answer
There must be some non-float data in your Z column. Probably some "NULL", "NAN" or "". This renders the "mean" aggregator useless.
I Created a gist with a minimum working example (using csv data) of how geopandas works just fine with real np.nan nulls but drops the column if there are "NaN" strings on it. Geopandas won't apply the mean agg to columns of non numeric type (i.e: object columns). See it Here: https://gist.github.com/jjclavijo/8b8b44fd944c9698a0c4f4a58637748b
To solve this, after reviewing your data, you can safely cast your data to float, thus every non-float data will be converted to np.nan.