GeoPandas Spatial Join of Points and Polygons returns NaN after projecting

geopandaspoint-in-polygonspatial-join

I have an Excel file with some points (https://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/Sachgebiete/Energie/Unternehmen_Institutionen/E_Mobilitaet/Ladesaeulenregister.xlsx;jsessionid=BAC32A2221FDC9E40ED1C10B01B5C75C?__blob=publicationFile&v=17)and a .shp file of the area the points are in (https://daten.gdz.bkg.bund.de/produkte/sonstige/kfz250/aktuell/kfz250.gk3.shape.zip).
Ultimately I would like to have a df with all the points and a column telling me the area they are in.

I turn my regular pandas df into a gpd df using

points = gdp.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))

I then load the .shp file and check the crs

polygons = pdg.read_file('*path*\kfz250.shp')
polygons.crs

which returns:

< Projected CRS: EPSG:31467>  
Name: DHDN / 3-degree Gauss-Kruger zone 3
Axis Info [cartesian]:
- X[north]: Northing (metre)
- Y[east]: Easting (metre)
Area of Use:
- name: Germany - West-Germany - 7.5°E to 10.5°E
- bounds: (7.5, 47.27, 10.51, 55.09)
Coordinate Operation:
- name: 3-degree Gauss-Kruger zone 3
- method: Transverse Mercator
Datum: Deutsches Hauptdreiecksnetz
- Ellipsoid: Bessel 1841
- Prime Meridian: Greenwich    

So I setpoints.set_crs(epsg=31467, inplace=True) run points.crs and get:

< Projected CRS: EPSG:31467>
Name: DHDN / 3-degree Gauss-Kruger zone 3
Axis Info [cartesian]:
- X[north]: Northing (metre)
- Y[east]: Easting (metre)
Area of Use:
- name: Germany - West-Germany - 7.5°E to 10.5°E
- bounds: (7.5, 47.27, 10.51, 55.09)
Coordinate Operation:
- name: 3-degree Gauss-Kruger zone 3
- method: Transverse Mercator
Datum: Deutsches Hauptdreiecksnetz
- Ellipsoid: Bessel 1841
- Prime Meridian: Greenwich  

However if I then try an gpd.sjoin() I either get an empty gdf (inner) or one with only NaN values for the "joined df" (outer). I've tried pretty much all possible combinations of the join, but none worked.

Could someone tell me what I did wrong?

Best Answer

By using set_crs(epsg=31467, etc...) you are telling geopandas that the points are in EPSG:31467, but they're not, they are a geographic CRS.

What you need to do is tell geopandas the points are in a geographic CRS then use to_crs to reproject them to your desired CRS.

As an analogy, it's like changing the file extension of a Word doc from ".docx" to ".pdf" and expecting Adobe Acrobat to open the file correctly but you haven't created a PDF, the file is still a Word doc, but with an incorrect .pdf file extension. The correct way to get an actual PDF is save/export the Word doc to PDF format.

points = gpd.GeoDataFrame(df, crs=4326, geometry=gpd.points_from_xy(df.Längengrad, df.Breitengrad))
polygons = gpd.read_file(shp)
points = points.to_crs(epsg=31467)
points_polys = gpd.sjoin(points, polygons)

print(points_polys.head())
1         Albwerk GmbH & Co. KG  ...  5355186.990382587
2  EnBW mobility+ AG und Co.KG   ...  5355186.990382587
3  EnBW mobility+ AG und Co.KG   ...  5355186.990382587
8  EnBW mobility+ AG und Co.KG   ...  5355186.990382587

[5 rows x 40 columns]