I'm merging a list of shapefiles with this code:
from pathlib import Path
import pandas
import geopandas
from tqdm import tqdm
folder = Path(r"read path")
shapefiles = folder.glob("PARCELA(*).SHP")
gdf = pandas.concat([
geopandas.read_file(shp)
for shp in tqdm(shapefiles)
]).pipe(geopandas.GeoDataFrame)
gdf.to_file(folder / r'write path')
The shapefile is created correctly, the problem is that I have diferents projections on some of the shapefiles i'm merging and I want to normalize them. What I thought is to add a column with the SRID of each geometry for later reproject each geometry to a unique SRID.
I know how to extract the EPSG of each geometry:
geom_srid_num = gdf.crs.to_epsg()
But I don't know how to add a new column for each row of the concatenation shown before.
Any ideas?
Best Answer
First, you're going to have to break up the one-liner approach you've got set up so that you can add some extra info to each
GeoDataFrame
you read in.More importantly, I would strongly advise against concatenating
GeoDataFrames
that have different projections. This is because GeoPandas doesn't support a singleGeoDataFrame
having more than just one CRS, so any kind of geographic manipulation you try to perform on the concatenatedGeoDataFrame
will very likely result in some very weird results.Instead, you can transform them all to some master CRS (say EPSG:4326) and then concatenate them all as follows:
In the code above, the
gdf_final
variable has all the combined rows of the original data and two extra columns: "Original_File" and "Original_EPSG", which contain, respectively, the name of the original shapefile and the EPSG code of the original shapefile.Furthermore, the
gdf_final
variable has ALL of its geometric features in EPSG:4326 and can be properly used in geographic operations.