[GIS] How to correctly reproject a geodataframe with multiple geometry columns

geopandaspythonshapely

In the geopandas documentation it says that

A GeoDataFrame may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the set_geometry method.

I'm wondering how to use such a GeoDataFrame if the goal is to flexibly reproject the geometrical data in these various columns to one or more other coordinate reference systems. Here's what I tried.

First try

import geopandas as gpd
from shapely.geometry import Point

crs_lonlat = 'epsg:4326'        #geometries entered in this crs (lon, lat in degrees)
crs_new = 'epsg:3395'           #geometries needed in (among others) this crs
gdf = gpd.GeoDataFrame(crs=crs_lonlat)      
gdf['geom1'] = [Point(9,53), Point(9,54)]     
gdf['geom2'] = [Point(8,63), Point(8,64)]

#Working: setting geometry and reprojecting for first time.
gdf = gdf.set_geometry('geom1')
gdf = gdf.to_crs(crs_new)   #geom1 is reprojected to crs_new, geom2 still in crs_lonlat
gdf
Out: 
                             geom1          geom2
0  POINT (1001875.417 6948849.385)  POINT (8 63)
1  POINT (1001875.417 7135562.568)  POINT (8 64)

gdf.crs
Out: 'epsg:3395'

So far, so good. Things go off the rails if I want to set geom2 as the geometry column, and reproject that one as well:

#Not working: setting geometry and reprojecting for second time.
gdf = gdf.set_geometry('geom2')     #still in crs_lonlat...
gdf.crs                             #...but this still says crs_new...
Out: 'epsg:3395'

gdf = gdf.to_crs(crs_new)           #...so this doesn't do anything! (geom2 unchanged)
gdf
Out: 
                             geom1                      geom2
0  POINT (1001875.417 6948849.385)  POINT (8.00000 63.00000)
1  POINT (1001875.417 7135562.568)  POINT (8.00000 64.00000)

Ok, so, apparently, the .crs attribute of the gdf is not reset to its original value when changing the column that serves as the geometry – it seems, the crs is not stored for the individual columns. If that is the case, the only way I see to use reprojection with this dataframe, is to backtrack: start –> select column as geometry –> reproject gdf to crs_new –> use/visualize/… –> reproject gdf back to crs_lonlat –> goto start. This is not usable if I want to visualise both columns in one figure.

Second try

My second attempt was, to store the crs with each column separately, by changing the corresponding lines in the script above to:

gdf = gpd.GeoDataFrame()
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat)

However, it soon became clear that, though initialised as a GeoSeries, these columns are normal pandas Series, and don't have a .crs attribute the same way GeoSeries do:

gdf['geom1'].crs
AttributeError: 'Series' object has no attribute 'crs'

s = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
s.crs
Out: 'epsg:4326'

Is there something I'm missing here?

Is the only solution, to decide on the 'final' crs beforehand – and do all the reprojecting before adding the columns? Like so…

gdf = gpd.GeoDataFrame(crs=crs_new)
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat).to_crs(crs_new)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat).to_crs(crs_new)
#no more reprojecting done/necessary/possible! :/

…and then, when another crs is needed, rebuild the entire gdf from scratch? That can't be the way this was intended to be used.

(Full disclosure: I posted this question on stack overflow, but it wasn't getting much love/attention over there.)

EDIT: Answer – or workaround

geopandas does not accommodate this use case at the moment. My workaround is to not use a GeoDataFrame, but rather combine a normal pandas DataFrame, for the non-shapely data, with several geopandas GeoSeries, for the shapely geometry data. The GeoSeries each have their own crs and can be effortlessly reprojected whenever necessary.

Best Answer

In case of GeoDataFrame, CRS in GeoPandas is stored on the level of GeoDataFrame, not individual GeoSeries (as of version 0.7.0, there is a discussion to change it). At this moment, I think that your solution of reprojecting GeoSeries and then assigning then to GeoDataFrame is the best solution, although admittedly not very elegant. Feel free to express your thoughts on it on GitHub: https://github.com/geopandas/geopandas/issues/1193

Related Question