GeoPandas – Calculate Percentage of Area Intersect in Python

geodataframegeometrygeopandaspythonspatial-join

I have two GeoPandas dataframes, namely:

grid – the base map that contains a grid of 100×100 meter squares.

land – the land use of an area (e.g. farmland, meadow, etc.).

How do I, upon spatially joining both, create an extra column that would show the percentage of how much a specific land attribute intersects? For example:

  square  geometry     land    percentage
0      A    POLY..     farm           .40
1      A    POLY..   meadow           .60

Square A from grid intersects with both a farm and a meadow from land, and shows the percentage of how much those land types overlap with a 100×100 square.

What I have so far is simply the joining function, as I do not know how to approach getting the "percentage" column:

final = gpd.sjoin(grid, land, op='intersects', how='inner')

Best Answer

You can achieve this using overlay operations. Here's a quick example using some fake data.

import geopandas as gpd
from shapely.geometry import Polygon

# Creating the GeoDataFrame with the grid geometries
grid_gdf = gpd.GeoDataFrame(data={'grid_id':[101,102,103,104],
                                  'grid_cat':['W','X','Y','Z'],
                                  'geometry':[Polygon([(1,5),(3,5),(3,3),(1,3)]),
                                              Polygon([(3,5),(5,5),(5,3),(3,3)]),
                                              Polygon([(1,3),(3,3),(3,1),(1,1)]),
                                              Polygon([(3,3),(5,3),(5,1),(3,1)])]},
                            geometry='geometry')
grid_gdf['area_grid'] = grid_gdf.area
grid_gdf.plot(column='grid_id')

# Creating the GeoDataFrame with the land geometries
land_gdf = gpd.GeoDataFrame(data={'land_id':[1,2,3,4,5,6,7,8,9],
                                  'land_cat':['A','B','C','B','C','A','C','A','B'],
                                  'geometry':[Polygon([(0,6),(2,6),(2,4),(0,4)]),
                                              Polygon([(2,6),(4,6),(4,4),(2,4)]),
                                              Polygon([(4,6),(6,6),(6,4),(4,4)]),
                                              Polygon([(0,4),(2,4),(2,2),(0,2)]),
                                              Polygon([(2,4),(4,4),(4,2),(2,2)]),
                                              Polygon([(4,4),(6,4),(6,2),(4,2)]),
                                              Polygon([(0,2),(2,2),(2,0),(0,0)]),
                                              Polygon([(2,2),(4,2),(4,0),(2,0)]),
                                              Polygon([(4,2),(6,2),(6,0),(4,0)])]},
                            geometry='geometry')
land_gdf['area_land'] = land_gdf.area
land_gdf.plot(column='land_id')

# Performing overlay funcion
gdf_joined = gpd.overlay(grid_gdf,land_gdf, how='union')

# Calculating the areas of the newly-created geometries
gdf_joined['area_joined'] = gdf_joined.area

# Calculating the areas of the newly-created geometries in relation 
# to the original grid cells
gdf_joined['land_area_as_a_share_of_grid_area'] = (gdf_joined['area_joined'] / 
                                                   gdf_joined['area_grid'])

# Aggregating the results
results = (gdf_joined
           .groupby(['grid_id','land_cat'])
           .agg({'land_area_as_a_share_of_grid_area':'sum'}))

# Printing results
print(results)

#                   land_area_as_a_share_of_grid_area
# grid_id land_cat                                   
# 101.0   A                                      0.25
#         B                                      0.50
#         C                                      0.25
# 102.0   A                                      0.25
#         B                                      0.25
#         C                                      0.50
# 103.0   A                                      0.25
#         B                                      0.25
#         C                                      0.50
# 104.0   A                                      0.50
#         B                                      0.25
#         C                                      0.25

When adapting to your case, you'll likely want to change column names used for each operation, but you can probably understand the gist of what's going on.

Related Solutions

GeoPandas – Getting Polygon Areas Using GeoPandas

If the crs of the GeoDataFrame is known (EPSG:4326 unit=degree, here), you don't need Shapely, nor pyproj in your script because GeoPandas uses them).

import geopandas as gpd
test = gpd.read_file("test_wgs84.shp")
print test.crs
test.head(2)

Now copy your GeoDataFrame and change the projection to a Cartesian system (EPSG:3857, unit= m as in the answer of ResMar)

tost = test.copy()
tost= tost.to_crs({'init': 'epsg:3857'})
print tost.crs
tost.head(2)

Now the area in square kilometers

tost["area"] = tost['geometry'].area/ 10**6
tost.head(2)

But the surfaces in the Mercator projection are not correct, so with other projection in meters.

tost= tost.to_crs({'init': 'epsg:32633'})
tost["area"] = tost['geometry'].area/ 10**6
tost.head(2)

[GIS] Problem getting correct area for polygon and choosing a CRS

The problem is caused by setting to_crs() based on another non-lat/lon CRS instead of based on the naive geometries.

Here's something that works. First import the packages and your geopandas dataframe. In my case, I'm loading just one hexagon and making a dataframe out of it to test the code.

import geopandas as gp
from shapely import wkt
data = {'geometry':'POLYGON ((139.7671 35.68250088833567, 139.7684808422803 35.68185043597622, 139.7684808198672 35.68054954811675, 139.7671 35.67989911138261, 139.7657191801327 35.68054954811675, 139.7657191577197 35.68185043597622, 139.7671 35.68250088833567))'}    
df = gp.GeoDataFrame([data], columns=data.keys())
df['geometry'] = df['geometry'].apply(wkt.loads)

In your case, you can just read in the geoPandasDataframe in whatever way is appropriate. Now set the initial CRS to naive geometries; i.e., the standard projection in decimal degrees of longitude and latitude.

df.crs = 'epsg:4326'

Now, since we're trying to calculate surface areas, it's a good idea to use cea (cylindrical equal area) projections. And to minimize distortion you can specify a lat/lon to use as a reference point (I used the first point of the polygon above, which is in the center of my region of interest).

df = df.to_crs("+proj=cea +lat_0=35.68250088833567 +lon_0=139.7671 +units=m")

Now you can confirm the transformed geometry in meters, and the area in square meters of the hexagon in this projection:

print(df.at[0,'geometry'])
print("Area of 54,127m2 hexagon is:",df.at[0,'geometry'].area)

It yields 54,126.5868 m2, so that's extremely accurate to the known value of 54,127.

The question about choosing a CRS for calculating areas and getting it to work with GeoPandas came up in a lot on forums and SE, but none of the answers were clear or straightforward or actually working. I hope my struggle with this, and my simple example here, can accelerate others' complicated and torturous journey through GIS computing to just get the area of their polygons.

Best Answer

Related Solutions

GeoPandas – Getting Polygon Areas Using GeoPandas

[GIS] Problem getting correct area for polygon and choosing a CRS

Related Question