Python GeoPandas – How to Merge GeoSeries to GeoDataFrame Based on Geometry Attribute

geodataframegeopandasmergepython

I have 5868 points in a geodataframe with some columns/attributes. Between points which have distance less than 10 m, I just want to select only one point as representation in that area. I have done using the following code:

ships = gpd.read_file(r"D:\Suhendra\Riset BARATA\data ais\lego_python\kepri_201812_ship.shp")
#'ships' have 5868 data/rows. It is geodataframe with some columns

#remove the 'ships' geometry that have less than 10 m distance each other
point_nodes = list(ships['geometry'])
for i in range(len(point_nodes) - 1):
    if point_nodes[i] is None:
        continue
    for j in range(i + 1, len(point_nodes)):
        if point_nodes[j] is None:
            continue
        if point_nodes[i].distance(point_nodes[j]) < 10: #in meter
            point_nodes[j] = None

new_point_nodes = gpd.GeoSeries([node for node in point_nodes if node is not None])
#'new_point_nodes' have 5321 data, it is just geoseries with geometry information

The result is 5321 points (reduced than the original data), but it is just geoseries not geodataframe like the original data. How to do the following condition in order to get the result like the original data?

Best Answer

If you modify your code to create a list of True/False instead of geometries/None you should be able to use loc and the list as mask:

Access a group of rows and columns by label(s) or a boolean array.

point_nodes = [True if x is not None else False for x in point_nodes]
newships = ships.loc[point_nodes]

Which will give you all columns of ships but only rows that are True.

(I came across a very fast way of reducing point densitys, see Clustering to Reduce Spatial Data Set Size. The DBSCAN algorithm is being used. I tried is on a point dataset of 484k points reducing them to 103k points in 4 seconds excluding the time it takes to read and write the shapefiles. It might not be reducing them in the way you want though but you should be able to change the method to get the results you want.

import geopandas as gpd
import pandas as pd
from sklearn.cluster import DBSCAN

in_shapefile = '/home/bera/GIS/data/test/points_484k_geoms.shp'
out_shapefile = '/home/bera/GIS/data/test/points_484k_geoms_reduced.shp'

df = gpd.read_file(in_shapefile)
coords = df.as_matrix(columns=['xcoord','ycoord']) #I added x and y coords to the shapefile

db = DBSCAN(eps=500, min_samples=1).fit(coords) #500 m is max distance to cluster points together
cluster_labels = pd.Series(db.labels_).rename('cluster')

df = pd.concat([df, cluster_labels], axis=1)
df2 = df.drop_duplicates(subset='cluster',keep='first') #Keep first point in each cluster

df2.to_file(out_shapefile)

)

Related Solutions

[GIS] Change marker size in plot with GeoPandas

In geopandas >= 0.3 (released September 2017), the plotting of points is based on the scatter plot method of matplotlib under the hood, and this accepts a variable markersize.

So now you can actually pass a column to markersize, what the OP did in the original question:

import geopandas

cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
# adding a column with random values for the size
cities['values'] = np.abs(np.random.randn(len(cities))) * 50

cities.plot(markersize=cities['values'])

gives:

Of course, if your goal is simply to change the markersize to a different constant value, you can still pass a single float to the keyword:

cities.plot(markersize=10)

Python – Using Buffer with Dissolve in Geopandas and Unary_Union Multipolygon

Put the geometry into a new GeoDataFrame, that's what has the .plot() method.

new = gpd.GeoDataFrame(crs=gdf2.crs, geometry=[gdfu])
new.plot()

then you can write out the new object

new.to_file('path/to/file')

Best Answer

Related Solutions

[GIS] Change marker size in plot with GeoPandas

Python – Using Buffer with Dissolve in Geopandas and Unary_Union Multipolygon

Related Question