I have a GeoDataFrame with 12.431 observations of geographical units, called "cities". I also have another layer file with points, called "points". Both of them are in CRS: EPSG4326.
I want to create a variable in the GeoDataFrame which indicates if it contains a point from the other file. This is done correctly from my code.
However, when I use len(df3)
I obtained that my number of observations increase to 12.599. How is it possible? I have explored if I have missing values in any column but I do not find them.
Why the increase of 168 observations?
I need to maintain the same number of observations as in "cities".
This is the code I am using:
import numpy as np
import pandas as pd
import geopandas as gpd
df3 = gpd.sjoin(cities, points[['geometry']], how="left")
df3['points'] = np.where(np.isnan(df3['index_right']), 0, 1)
del(df3['index_right'])
Best Answer
For each point inside a polygon, a new polygon will be created. So a polygon with 2 points in it will become two polygons in the output: