Geopandas – How to Fill NaN Geometries Records with Another Geometric Column

geodataframegeometrygeopandasmissing datapython

I have a GeoDataFrame with two geometry columns.
I want to fill missing values of the one with the other.
Both columns contain polygons or multipolygons.
I have tried:

geo_df['geom_2'].fillna(geo_df['geom_1'], inplace=True)

But an error was raised:
NotImplementedError: fillna currently only supports filling with a scalar geometry

Later, I tried:

geo_df['geom_2'].replace('None', geo_df['geom_1'], inplace=True)

and got the same error.

is there any possible solution for this task?
I'm using GeoPandas verision 0.10.2 .

Best Answer

Using a mask and assignment you can achieve this:

gdf.loc[gdf["geom_2"].isna(), "geom_2"] = gdf["geom_1"]

Full MWE:

import geopandas as gpd
import numpy as np
import random
import shapely

# create a MWE data set
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).loc[
    lambda d: (d["continent"] == "Europe")
    & (~d["iso_a3"].isin(["-99", "RUS"]))
    & (d.geom_type == "Polygon")
].head(8)

# create columns as per question with some nan geometries
gdf["geom_2"] = (
    gdf["geometry"]
    .exterior.apply(
        lambda g: shapely.geometry.Point(g.coords[random.randint(0, len(g.coords)) - 1])
    )
    .sample(int(len(gdf) * 0.75))
)
gdf["geom_1"] = (
    gdf["geometry"]
    .exterior.apply(
        lambda g: shapely.geometry.Point(g.coords[random.randint(0, len(g.coords)) - 1])
    )
)

# keep a record of what started nan
gdf["started_nan"] = gdf["geom_2"].isna()
# now fillna, use a mask and assignment
gdf.loc[gdf["geom_2"].isna(), "geom_2"] = gdf["geom_1"]

gdf

Related Solutions

Python Geopandas Rasterio – How to Rasterize Polygon Grid

I will start with #2 and #3. There is a tool out there called geocube: https://corteva.github.io/geocube/stable/index.html. It will handle all of the rasterization for you in a simpler process and handles multiple columns of data.

https://corteva.github.io/geocube/stable/examples/grid_to_vector_map.html

import geopandas as gpd
import numpy as np
from shapely.geometry import Polygon
from geocube.api.core import make_geocube


xmin,ymin,xmax,ymax = 1,1,5,5
lenght = 1
wide = 1

cols = list(range(int(np.floor(xmin)), int(np.ceil(xmax)), wide)) 
rows = list(range(int(np.floor(ymin)), int(np.ceil(ymax)), lenght)) 
rows.reverse()

polygons = []
for x in cols:
    for y in rows:
        polygons.append( Polygon([(x,y), (x+wide, y), (x+wide, y-lenght), (x, y-lenght)]) )

g = gpd.GeoDataFrame(
    {"data1": list(range(len(polygons))), "data2": list(range(10, 10+len(polygons)))},
    geometry=polygons,
    crs={"init": "epsg:4326"}
)

cube = make_geocube(vector_data=g, resolution=(1, -1))

This will provide the data in a gridded xarray interface with the proper geospatial metadata set.

<xarray.Dataset>
Dimensions:      (x: 4, y: 4)
Coordinates:
  * y            (y) float64 0.5 1.5 2.5 3.5
  * x            (x) float64 4.5 3.5 2.5 1.5
    spatial_ref  int64 0
Data variables:
    data1        (y, x) float64 15.0 11.0 7.0 3.0 14.0 ... 1.0 12.0 8.0 4.0 0.0
    data2        (y, x) float64 25.0 21.0 17.0 13.0 24.0 ... 22.0 18.0 14.0 10.0

The data looks like:

You can then export each column of the data to a tif with:

cube.data1.rio.to_raster("data1.tif")

1) Where in the code I am doing the error(s)

The main issue is that you are not adding the transform for the raster that defines the resolution of the grid cells and its location.

4) why does the g.geometry column contain 5 coordinate pairs, first and last being the same, and not 4?)

This is because shapely is closing the loop of the polygon automatically for you.

Geopandas – Solving Buffering Issues that Delete Attribute Data

Looks like you're only storing the geometry field in the polygons_buffered GDF. You can apply the buffer function directly to the original GDF and either modify the geometry field in place or store in another gdf (which requires you to store more than just the computed field in the new gdf, which is where you had issues with your original code). Since you're exporting immediately as a SHP, I'd go with the former option:

polygons['geometry'] = polygons['geometry'].buffer(polygons['Distance'])
polygons.head()
polygons.plot()

Best Answer

Related Solutions

Python Geopandas Rasterio – How to Rasterize Polygon Grid

Geopandas – Solving Buffering Issues that Delete Attribute Data

Related Question