[GIS] Randomly sample from geopandas DataFrame in Python

gdalgeopandaspythonrandomsampling

I am reading a shapefile as geopandas DataFrame and them using pandas subset method to select a region.

geodata = gpd.read_file(bayshp)

geodata.dtypes

geodata.head(10)

OBJECTID    FIPSSTCO    COUNTY  geometry
1   06001   Alameda (POLYGON ((6065941.393835935 2104148.464510527...
2   06013   Contra Costa    (POLYGON ((6143913.640835938 2209458.230510532...
3   06041   Marin   (POLYGON ((5879149.417835938 2203020.920510533...
4   06055   Napa    POLYGON ((6075700.362835937 2441916.530510533,...
5   06075   San Francisco   (POLYGON ((5990480.312835939 2123810.13351053,..

Subset the df:

# Subset based on the index
geosub = geodata.iloc[0:2]

I've got a function that accepts geopandas DataFrame and number of points to sample as arguments.

def sample_random_geo(df, n):

    # Randomly sample geolocation data from defined polygon 
    points = np.random.sample(df, n)

    return points

However, the np.random.sample or for that matter any numpy random sampling doesn't support geopandas object type.

I am wondering if there is a way to randomly sample geocoordinates from the spatial region.

Best Answer

Here's another way to do it:

import geopandas as gpd
import numpy as np

# load an example polygons geodataframe
gdf_polys = gpd.read_file(gpd.datasets.get_path('nybb'))

It looks like the following:

enter image description here

# find the bounds of your geodataframe
x_min, y_min, x_max, y_max = gdf_polys.total_bounds

# set sample size
n = 100
# generate random data within the bounds
x = np.random.uniform(x_min, x_max, n)
y = np.random.uniform(y_min, y_max, n)

# convert them to a points GeoSeries
gdf_points = gpd.GeoSeries(gpd.points_from_xy(x, y))
# only keep those points within polygons
gdf_points = gdf_points[gdf_points.within(gdf_polys.unary_union)]

Now you have:

enter image description here

Related Question