[GIS] Assign a point to polygon using pandas and shapely

pandaspythonshapely

I have a series of Easting and Northing coordinates in pandas dataframe along with other columns of information. I know the coordinates fall within and administrative area and want to match the coordinates and rest of columns to that administrative area, I have the shapefile of the administrative boundaries. I'm familiar with R and visually this looks easy, I can plot the coordinate and see it fall into the area. However i have n=8,335 records to match and need to use Python as all other records are loaded and the analysis i have is better suited to Python.

So far i have work flow as;

Convert Easting and Northing to lat and long

Somehow use Shapely to tell me if point is in polygon / I understand that matplotlib.path.Path.contains_points executes faster.

Assign point to polygon and add new column to dataframe with the polgon reference in it.

Use pandas merge to join dataframe to others. Project to map again and check visually.

Does that sound like a sensible approach or is there a more efficient route?

Best Answer

If you're working with spatial data and Pandas you should take a look at GeoPandas.

The example below demonstrates how to perform a spatial join in GeoPandas (which uses Shapely). A GeoDataFrame object is created from a list of cities and their coordinates and is joined to an ESRI Shapefile containing countries.

import pandas
import geopandas
import geopandas.tools
from shapely.geometry import Point

# Create a DataFrame with some cities, including their location
raw_data = [
    ("London", 51.5, -0.1),
    ("Paris", 48.9, 2.4),
    ("San Francisco", 37.8, -122.4),
]
places = pandas.DataFrame(raw_data, columns=["name", "latitude", "longitude"])

# Create the geometry column from the coordinates
# Remember that longitude is east-west (i.e. X) and latitude is north-south (i.e. Y)
places["geometry"] = places.apply(lambda row: Point(row["longitude"], row["latitude"]), axis=1)
del(places["latitude"], places["longitude"])

# Convert to a GeoDataFrame
places = geopandas.GeoDataFrame(places, geometry="geometry")

# Declare the coordinate system for the places GeoDataFrame
# GeoPandas doesn't do any transformations automatically when performing
# the spatial join. The layers are already in the same CRS (WGS84) so no
# transformation is needed.
places.crs = {"init": "epsg:4326"}

# Load the countries polygons
countries = geopandas.GeoDataFrame.from_file("ne_110m_admin_0_countries.shp")
# Drop all columns except the name and polygon geometry
countries = countries[["name", "geometry"]]

# Perform the spatial join
result = geopandas.tools.sjoin(places, countries, how="left")

# Print the results...
print(result.head())

Note that the spatial join feature is still fairly new and hasn't made it into the stable branch yet - you'll need to download and install the development version.

https://github.com/geopandas/geopandas

The result looks like this:

       name_left             geometry  index_right      name_right
0         London    POINT (-0.1 51.5)           57  United Kingdom
1          Paris     POINT (2.4 48.9)           55          France
2  San Francisco  POINT (-122.4 37.8)          168   United States

You can also use GeoPandas to plot the data with matplotlib:

import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
countries.plot(ax=ax, color="#cccccc")
places.plot(ax=ax, markersize=5, color="#cc0000")
plt.show()

places and countries