Python GeoPandas – Adding Column with SRID of Each Geometry to GeoDataFrame

geopandaspandaspython

I'm merging a list of shapefiles with this code:

from pathlib import Path
import pandas
import geopandas
from tqdm import tqdm

folder = Path(r"read path")
shapefiles = folder.glob("PARCELA(*).SHP")
gdf = pandas.concat([
    geopandas.read_file(shp)
    for shp in tqdm(shapefiles)
]).pipe(geopandas.GeoDataFrame)
gdf.to_file(folder / r'write path')

The shapefile is created correctly, the problem is that I have diferents projections on some of the shapefiles i'm merging and I want to normalize them. What I thought is to add a column with the SRID of each geometry for later reproject each geometry to a unique SRID.

I know how to extract the EPSG of each geometry:

geom_srid_num  = gdf.crs.to_epsg()

But I don't know how to add a new column for each row of the concatenation shown before.

Any ideas?

Best Answer

First, you're going to have to break up the one-liner approach you've got set up so that you can add some extra info to each GeoDataFrame you read in.

More importantly, I would strongly advise against concatenating GeoDataFrames that have different projections. This is because GeoPandas doesn't support a single GeoDataFrame having more than just one CRS, so any kind of geographic manipulation you try to perform on the concatenated GeoDataFrame will very likely result in some very weird results.

Instead, you can transform them all to some master CRS (say EPSG:4326) and then concatenate them all as follows:

from pathlib import Path
import pandas
import geopandas
from tqdm import tqdm

folder = Path(r"read path")
shapefiles = folder.glob("PARCELA(*).SHP")
gdf_list = []
for shp in tqdm(shapefiles):
    gdf = geopandas.read_file(shp)
    gdf['Original_File'] = str(shp)
    gdf['Original_EPSG'] = gdf.crs.to_epsg()
    gdf = gdf.to_crs('epsg:4326')
    gdf_list.append(gdf.copy())

gdf_final = pandas.concat(gdf_list, ignore_index=True)
gdf_final.to_file(folder / r'write path')

In the code above, the gdf_final variable has all the combined rows of the original data and two extra columns: "Original_File" and "Original_EPSG", which contain, respectively, the name of the original shapefile and the EPSG code of the original shapefile.

Furthermore, the gdf_final variable has ALL of its geometric features in EPSG:4326 and can be properly used in geographic operations.

Related Question