GeoPandas – How to Filter Dataframe for Points Within a Specific Country

filtergeopandasshapely

I have a dataframe of health survey data which is geotagged, called momdata. I have converted the dataframe to a GeoPandas dataframe, and used the geotags as the geometry column within the dataframe, for example:

momdata['geometry']

0        POINT (-0.0893 51.4735)
1       POINT (-0.0894 51.4732)
2         POINT (-0.0898 51.4717)
3        POINT (-0.0907 51.4727)
4       POINT (-0.0901 51.4723)
5       POINT (-0.0816 51.4742)

I want to filter the dataframe so that only points in the UK are returned. I have the UK coordinates in a separate Geoseries, called uk_geom, which I simply took from the GeoPandas built-in world map:

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
uk_index = world[world.name == "United Kingdom"].index
uk_geom = world.loc[uk_index, 'geometry']

This returns uk_geom as a GeoPandas GeoSeries:

type(uk_geom) # geopandas.geoseries.GeoSeries

I simply want to filter momdata for all points within momdata.geometry that fall within uk_geom, returning a dataframe of only UK-based survey observations. This should be simple.

I tried:

uk_momdata = momdata[momdata.geometry.within(uk_geom)]

but this returns an empty dataframe, when I know that some of the survey observations for sure are in the UK.

For example:

p1 = momdata.geometry.loc[0]
p2 = momdata.geometry.loc[1]

print(uk_geom.contains(p1)) # returns TRUE
print(uk_geom.contains(p2)) # returns TRUE

I tried this the other way round, checking which momdata points are contained within uk_geom:

uk_geom.contains(momdata.geometry).value_counts() #2040 false points

Also when I test the 'within' function on a point that I know is within momdata, I get an error:

print(point8.within(uk_geom)) # AttributeError: 'GeoSeries' object has no attribute '_geom' 

I have assigned the correct co-ordinate reference system:

assert uk_geom.crs == momdata.crs # no problem

I also tried a basic 'apply' function using a predicate, but this returns an error:

momdata[momdata.geometry.apply(lambda p: uk_geom.contains(p))] # Null geometry supports no operations

I also tried a spatial join, but then I get the error that one of the join columns is not a DataFrame, as of course it's trying to join on the geometry column:

from geopandas.tools import sjoin
join_left_df = sjoin(momdata, uk_geom, how="left")
join_left_df

How do I solve this?

I can't seem to make it work.

Best Answer

Did you see More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc and other answers on GIS SE ?

Simply

import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
uk =  world[world.name == "United Kingdom"]
type(uk)
geopandas.geodataframe.GeoDataFrame

So uk is a GeoDataFrame

uk.head()
     pop_est     continent  name           iso_a3  gdp_md_est        geometry  
57  62262000.0    Europe   United Kingdom    GBR   1977704.0   (POLYGON ((-5.661948614921897 54.5546031764838...  

The points shapefile:

points = gpd.read_file('uk_points.shp') 
points.head()
   FID                              geometry
0  0.0               POINT (-0.0893 51.4735)
1  1.0               POINT (-0.0894 51.4732) 
2  2.0               POINT (-0.0898 51.4717)
3  3.0               POINT (-0.0907 51.4727)
4  4.0               POINT (-0.0901 51.4723)

And now

from geopandas.tools import sjoin
pointInPolys = sjoin(points, uk, how='left')
pointInPolys.head()

 FID                              geometry  index_right     pop_est    continent            name iso_a3  gdp_md_est  
 0  0.0               POINT (-0.0893 51.4735)            0  62262000.0    Europe  United Kingdom    GBR   1977704.0  
 1  1.0               POINT (-0.0894 51.4732)            0  62262000.0    Europe  United Kingdom    GBR   1977704.0  
 2  2.0               POINT (-0.0898 51.4717)            0  62262000.0    Europe  United Kingdom    GBR   1977704.0  
 3  3.0               POINT (-0.0907 51.4727)            0  62262000.0    Europe  United Kingdom    GBR   1977704.0   
 4  4.0               POINT (-0.0901 51.4723)            0  62262000.0    Europe  United Kingdom    GBR   1977704.0