GeoPandas – Groupby Function Omitting Columns in GeoPandas

geopandaspandaspython

I am using this answer to calculate some basic statistics of some points that fall within the bounds of a polygon (a vector grid), such that:

gridfile = 'grid.shp'
pointfile = 'points.shp'

point = gpd.GeoDataFrame.from_file(pointfile)

poly  = gpd.GeoDataFrame.from_file(gridfile)

pointInPolys = sjoin(point, poly, how='left')

grouped = pointInPolys.groupby('index_right')['X','Y','Z'].agg(['mean'])

grouped.columns = ["_".join(x) for x in grouped.columns.ravel()]

The input point data has X, Y and Z columns. However, it is only returning statistics (mean) for X and Y and the stats for the Z column are not being returned:

            X_mean     Y_mean
index_right                      
1221        -64.781242  32.439396
1902        -64.781206  32.439096
2412        -64.781169  32.438777

The data is definitely available in the prior step by checking:

pointInPolys.keys()

Index(['X', 'Y', 'Z', 'geometry', 'index_right', 'DN'], dtype='object')

Is there a reason why the Z column stats are not being calculated?

Best Answer

There must be some non-float data in your Z column. Probably some "NULL", "NAN" or "". This renders the "mean" aggregator useless.

I Created a gist with a minimum working example (using csv data) of how geopandas works just fine with real np.nan nulls but drops the column if there are "NaN" strings on it. Geopandas won't apply the mean agg to columns of non numeric type (i.e: object columns). See it Here: https://gist.github.com/jjclavijo/8b8b44fd944c9698a0c4f4a58637748b

To solve this, after reviewing your data, you can safely cast your data to float, thus every non-float data will be converted to np.nan.

grouped = pointInPolys.groupby('index_right')[['X','Y','Z']].agg(['mean'])
grouped

>>>
                X             Y
index_right     mean            mean    
0               -5.923750   -4.268750
1               32.738333   2.204000
2               32.669667   -5.528667

pointInPolys.loc[:,'Z'] = pointInPolys.Z.astype(float)
grouped = pointInPolys.groupby('index_right')[['X','Y','Z']].agg(['mean'])

   X            Y           Z
   mean          mean         mean          
0   -5.923750   -4.268750   609.49575
1   32.738333   2.204000    645.05100
2   32.669667   -5.528667   483.71250