If you're working with spatial data and Pandas you should take a look at GeoPandas.
The example below demonstrates how to perform a spatial join in GeoPandas (which uses Shapely). A GeoDataFrame
object is created from a list of cities and their coordinates and is joined to an ESRI Shapefile containing countries.
import pandas
import geopandas
import geopandas.tools
from shapely.geometry import Point
# Create a DataFrame with some cities, including their location
raw_data = [
("London", 51.5, -0.1),
("Paris", 48.9, 2.4),
("San Francisco", 37.8, -122.4),
]
places = pandas.DataFrame(raw_data, columns=["name", "latitude", "longitude"])
# Create the geometry column from the coordinates
# Remember that longitude is east-west (i.e. X) and latitude is north-south (i.e. Y)
places["geometry"] = places.apply(lambda row: Point(row["longitude"], row["latitude"]), axis=1)
del(places["latitude"], places["longitude"])
# Convert to a GeoDataFrame
places = geopandas.GeoDataFrame(places, geometry="geometry")
# Declare the coordinate system for the places GeoDataFrame
# GeoPandas doesn't do any transformations automatically when performing
# the spatial join. The layers are already in the same CRS (WGS84) so no
# transformation is needed.
places.crs = {"init": "epsg:4326"}
# Load the countries polygons
countries = geopandas.GeoDataFrame.from_file("ne_110m_admin_0_countries.shp")
# Drop all columns except the name and polygon geometry
countries = countries[["name", "geometry"]]
# Perform the spatial join
result = geopandas.tools.sjoin(places, countries, how="left")
# Print the results...
print(result.head())
Note that the spatial join feature is still fairly new and hasn't made it into the stable branch yet - you'll need to download and install the development version.
https://github.com/geopandas/geopandas
The result looks like this:
name_left geometry index_right name_right
0 London POINT (-0.1 51.5) 57 United Kingdom
1 Paris POINT (2.4 48.9) 55 France
2 San Francisco POINT (-122.4 37.8) 168 United States
You can also use GeoPandas to plot the data with matplotlib:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
countries.plot(ax=ax, color="#cccccc")
places.plot(ax=ax, markersize=5, color="#cc0000")
plt.show()
![places and countries](https://i.stack.imgur.com/qAWmm.png)
Your lon, lat cols are object types. Cast them to floats:
geometry = geopandas.points_from_xy(df.longitude.astype('float32'), df.latitude.astype('float32'))
#OR
geometry = geopandas.points_from_xy(df['longitude'].astype('float32'), df['latitude'].astype('float32'))
Best Answer
Convert the DataFrame's content (e.g.
Lat
andLon
columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.Result:
Since the geometries often come in the WKT format, I thought I'd include an example for that case as well: