[GIS] Cannot find polygons that are inside a big polygon using GeoPandas

geopandaspolygonpython-2.7

First of all, some info about my system characteristics:

  • Python 2.7
  • GeoPandas 0.3.0

I am working on a program, where I filter out the data of two shapefiles -link with data- (each shapefile is a vector layer with polygons) and organize it according some statements. The two shapefiles look like the picture below, the picture is just a small part of the whole map.

Link to the picture

After doing some filtering, I need to find in the remaining data those polygons inside the big polygons (in the picture the purple and the green polygon) and change some information if there is an intersection. I have tried using some of the GeoPandas functions for this task, but the results are not the ones that I expected.

Try 1:

# Read .shp files
b = gpd.read_file(fp + 'puchheim_buildings.shp')
la = gpd.read_file(fp + 'puchheim_landuse.shp')

#Small parts of the shape files
la_g = la.groupby('fclass').get_group('residential')
b_r = b[100:130]  # here I know there is an intersection with the big polygon

# Find the polygons inside big polygons 
inter = gpd.overlay(b_r, la_g, how='intersection')

Here the work is well done, but the function returns me another shapefile and it takes a lot computing time if a use the hole data in the shapefiles.

Try 2:

# Read .shp files
b = gpd.read_file(fp + 'puchheim_buildings.shp')
la = gpd.read_file(fp + 'puchheim_landuse.shp')

# Find the polygons inside big polygons 
inter = b.within(la.loc[0, 'geometry'])

In “try 2” I got a Boolean series with everything false, although I know that there are polygons inside the big ones. I tried to use instead the polygons their centroid and make the same calculations, but the result was the same, a Boolean series with all false

# Find the polygons inside big polygons 
b_c = b.centroid
inter = b_c.within(la.loc[0, 'geometry'])

I would prefer a boolean series, because I just need to change some information in the shapefile if there is an intersection. The idea is something like this (the code below is a matter illustration from my idea):

# Read .shp files
b = gpd.read_file(fp + 'puchheim_buildings.shp')
la = gpd.read_file(fp + 'puchheim_landuse.shp')

# Find intersections
inter = b.within(la.loc[0, 'geometry'])

# Comparation
for r in range(len(b))  
    if inter[r] == True:
        b[r]['type'] = 'residential'

How can I do that or have a better idea that achieves the result that I expect?

Best Answer

I think what you want is a "spatial join", where we add information of the second dataframe (the land use dataframe, information on the type of land use) to the first dataframe (the buildings) based on their spatial relationship:

gpd.sjoin(b, la[['fclass', 'geometry']], how='left', op='intersects')

This adds a new column to the b dataframe with the land use class in which each building is located.


Some additional notes about your attempts:

Try 1: a overlay can be suited here if you want to create a new geometry (e.g. in the case where a building is only partly inside a big land use polygon, then overlay will actually clip the original builing's geometry). But if you don't want to edit the geometries (take the intersection), the the spatial join (sjoin) shown above is more appropriate.
It is true that overlay is currently very show, as you noticed. There is work underway to improve that in the next version of geopandas: https://github.com/geopandas/geopandas/pull/429.

Try 2: this should also work, and actually for me it does not give a full series of False values. For the exact example you gave (inter = b.within(la.loc[0, 'geometry'])), I get 18 True values (from inter.sum()). However, this is actually equivalent to the spatial join, apart from that the spatial join does this automatically for all elements of the land use dataset.