GeoPandas – Merging a GeoDataFrame and Pandas DataFrame by Column

concatenationgeopandasmergepandaspython 3

I have a simple geopandas dataframe that includes geometry and a column called 'MUKEY':

merged_spatial_df.head()

enter image description here

I want to merge it to a tabular (.csv) pandas dataframe (which also has a column called 'MUKEY') based on 'MUKEY'.

merged_tab_df.head()

enter image description here
There are 31,000 rows in merged_spatial_df and about 391 in merged_tab_df, but each unique MUKEY value in merged_tab_df corresponds to one in merged_spatial_df. I tried the following but can't seem to merge them together and .sjoin requires 2 geodataframes. I tried:

merged_master = gpd.GeoDataFrame(merged_tab_df.merge(merged_spatial_df, how='right'))

but I got an error – ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat, so I tried:

merged_master = gpd.GeoDataFrame(pd.concat([merged_tab_df, merged_spatial_df], join ='inner'))

None of the concats I tried produced viable results. What am I doing wrong here?

Best Answer

You should specify a common key using on parameter. I removed merged_ prefix for legibility.

df  = spatial_df.merge(tab_df, on='mukey', how='left')
# df  = tab_df.merge(spatial_df, on='mukey', how='right')
gdf = gpd.GeoDataFrame(df)

Sample spatial_df:

    col1 col2  mukey geometry
0   A    1.76  1     ...
1   B    0.40  2     ...
2   C    0.97  3     ...
3   D    2.24  4     ...
4   X    1.86  5     ...
5   Y    0.97  6     ...

Sample tab_df:

    col3 col4  mukey
0   100  0.95  2
1   200  0.15  3
2   300  0.10  4

Result:

    col1  col2  mukey geometry  col3   col4
0   A     1.76  1     ...       NaN    NaN
1   B     0.40  2     ...       100.0  0.95
2   C     0.97  3     ...       200.0  0.15
3   D     2.24  4     ...       300.0  0.10
4   X     1.86  5     ...       NaN    NaN
5   Y     0.97  6     ...       NaN    NaN