Python GeoPandas – Adding Column with SRID of Each Geometry to GeoDataFrame

geopandaspandaspython

I'm merging a list of shapefiles with this code:

from pathlib import Path
import pandas
import geopandas
from tqdm import tqdm

folder = Path(r"read path")
shapefiles = folder.glob("PARCELA(*).SHP")
gdf = pandas.concat([
    geopandas.read_file(shp)
    for shp in tqdm(shapefiles)
]).pipe(geopandas.GeoDataFrame)
gdf.to_file(folder / r'write path')

The shapefile is created correctly, the problem is that I have diferents projections on some of the shapefiles i'm merging and I want to normalize them. What I thought is to add a column with the SRID of each geometry for later reproject each geometry to a unique SRID.

I know how to extract the EPSG of each geometry:

geom_srid_num  = gdf.crs.to_epsg()

But I don't know how to add a new column for each row of the concatenation shown before.

Any ideas?

Best Answer

First, you're going to have to break up the one-liner approach you've got set up so that you can add some extra info to each GeoDataFrame you read in.

More importantly, I would strongly advise against concatenating GeoDataFrames that have different projections. This is because GeoPandas doesn't support a single GeoDataFrame having more than just one CRS, so any kind of geographic manipulation you try to perform on the concatenated GeoDataFrame will very likely result in some very weird results.

Instead, you can transform them all to some master CRS (say EPSG:4326) and then concatenate them all as follows:

from pathlib import Path
import pandas
import geopandas
from tqdm import tqdm

folder = Path(r"read path")
shapefiles = folder.glob("PARCELA(*).SHP")
gdf_list = []
for shp in tqdm(shapefiles):
    gdf = geopandas.read_file(shp)
    gdf['Original_File'] = str(shp)
    gdf['Original_EPSG'] = gdf.crs.to_epsg()
    gdf = gdf.to_crs('epsg:4326')
    gdf_list.append(gdf.copy())

gdf_final = pandas.concat(gdf_list, ignore_index=True)
gdf_final.to_file(folder / r'write path')

In the code above, the gdf_final variable has all the combined rows of the original data and two extra columns: "Original_File" and "Original_EPSG", which contain, respectively, the name of the original shapefile and the EPSG code of the original shapefile.

Furthermore, the gdf_final variable has ALL of its geometric features in EPSG:4326 and can be properly used in geographic operations.

Related Solutions

Python – Converting DataFrame with Geometry Column into GeoDataFrame in Pandas

Each of you geometries is within a list, so you are effectively passing a list of lists as a geometry to GeoDataFrame. You have to pass a list-like of geometries, not lists.

Using apply you can get the actual geometry out of the list.

b = gpd.GeoDataFrame(a[['a', 'b']], geometry=a['c'].apply(lambda x: x[0]))

Python – Converting Pandas DataFrame to GeoDataFrame with Polygon Geometry

Instead of using apply this can be done using the agg method with named aggregations. The only thing is that agg cannot yet operate on multiple columns, so the points must be condensed to a single column beforehand.

Also note that when converting points to polygons, the aggregation function must call .values, since x being passed there is a pd.Series, which Polygon does not know how to handle.

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon


df = pd.DataFrame({'name':['a1','a2','a3','a4','a5','a6'],
                   'loc_x':[0,1,2,3,4,5],
                   'loc_y':[1,2,3,4,5,6],
                   'grp_name':['set1','set1','set1','set2','set2','set2']})

df['points'] = gpd.points_from_xy(df.loc_x, df.loc_y)

df = df.groupby('grp_name').agg(
     name     = pd.NamedAgg(column='name',   aggfunc = lambda x: '|'.join(x)),
     geometry = pd.NamedAgg(column='points', aggfunc = lambda x: Polygon(x.values))
    ).reset_index()

geodf = gpd.GeoDataFrame(df, geometry='geometry')

print(geodf)
  grp_name      name                                           geometry
0     set1  a1|a2|a3  POLYGON ((0.00000 1.00000, 1.00000 2.00000, 2....
1     set2  a4|a5|a6  POLYGON ((3.00000 4.00000, 4.00000 5.00000, 5....

Best Answer

Related Solutions

Python – Converting DataFrame with Geometry Column into GeoDataFrame in Pandas

Python – Converting Pandas DataFrame to GeoDataFrame with Polygon Geometry

Related Question