Python GeoPandas – How to Merge GeoSeries to GeoDataFrame Based on Geometry Attribute

geodataframegeopandasmergepython

I have 5868 points in a geodataframe with some columns/attributes. Between points which have distance less than 10 m, I just want to select only one point as representation in that area. I have done using the following code:

ships = gpd.read_file(r"D:\Suhendra\Riset BARATA\data ais\lego_python\kepri_201812_ship.shp")
#'ships' have 5868 data/rows. It is geodataframe with some columns

#remove the 'ships' geometry that have less than 10 m distance each other
point_nodes = list(ships['geometry'])
for i in range(len(point_nodes) - 1):
    if point_nodes[i] is None:
        continue
    for j in range(i + 1, len(point_nodes)):
        if point_nodes[j] is None:
            continue
        if point_nodes[i].distance(point_nodes[j]) < 10: #in meter
            point_nodes[j] = None

new_point_nodes = gpd.GeoSeries([node for node in point_nodes if node is not None])
#'new_point_nodes' have 5321 data, it is just geoseries with geometry information

The result is 5321 points (reduced than the original data), but it is just geoseries not geodataframe like the original data. How to do the following condition in order to get the result like the original data?

Best Answer

If you modify your code to create a list of True/False instead of geometries/None you should be able to use loc and the list as mask:

Access a group of rows and columns by label(s) or a boolean array.

point_nodes = [True if x is not None else False for x in point_nodes]
newships = ships.loc[point_nodes]

Which will give you all columns of ships but only rows that are True.

(I came across a very fast way of reducing point densitys, see Clustering to Reduce Spatial Data Set Size. The DBSCAN algorithm is being used. I tried is on a point dataset of 484k points reducing them to 103k points in 4 seconds excluding the time it takes to read and write the shapefiles. It might not be reducing them in the way you want though but you should be able to change the method to get the results you want.

import geopandas as gpd
import pandas as pd
from sklearn.cluster import DBSCAN

in_shapefile = '/home/bera/GIS/data/test/points_484k_geoms.shp'
out_shapefile = '/home/bera/GIS/data/test/points_484k_geoms_reduced.shp'

df = gpd.read_file(in_shapefile)
coords = df.as_matrix(columns=['xcoord','ycoord']) #I added x and y coords to the shapefile

db = DBSCAN(eps=500, min_samples=1).fit(coords) #500 m is max distance to cluster points together
cluster_labels = pd.Series(db.labels_).rename('cluster')

df = pd.concat([df, cluster_labels], axis=1)
df2 = df.drop_duplicates(subset='cluster',keep='first') #Keep first point in each cluster

df2.to_file(out_shapefile)

enter image description here

)