[GIS] Convert Numpy array to shapefile

pythonrastershapefile

I read in a raster and turned it into a binary image (land/water mask) of 0's (water) and 255's (land). I've named this array src_landCoverArray.

I want to output this array as a shapefile with water as polygons and land as empty space.

I've tried using rasterio.features.shapes() described here (https://stackoverflow.com/questions/37898113/python-boolean-array-to-polygon) but it produced a shapefile with the wrong extent. My code:

import shapely.geometry
myShapes = rasterio.features.shapes(src_landCoverArray)
polygons = [shapely.geometry.Polygon(shape[0]["coordinates"][0]) for shape in myShapes if shape[1] == 0]
crs = {'init': 'epsg:4326'}
my_gdf = gpd.GeoDataFrame(crs=crs, geometry=polygons)
my_gdf.to_file("Water_Poly2.shp", driver='ESRI Shapefile')

I've tried using a similar method with rasterio described here (How to polygonize raster to shapely polygons) but failed again with the location/extent being incorrect. My code:

mypoly = []

for vec in rasterio.features.shapes(src_landCoverArray):
    mypoly.append(shape(vec[0]))
crs = {'init': 'epsg:4326'}
my_gdf = gpd.GeoDataFrame(crs=crs, geometry=mypoly)
my_gdf.to_file("Water_Poly_mypoly.shp", driver='ESRI Shapefile')

I feel that I'm close to success if I could figure out the issue of extent/location being wrong. Has anyone tackled a problem similar to this?

Best Answer

I think I have experienced this issue before. The key detail is that rasterio.features.shapes needs to know about the "transform" of the raster dataset. (The "transform" is how it maps the pixels in the array to the actual coordinates.)

The dataset has a .transform property that you can pass to rasterio.features.shapes, as in this example from the rasterio documentation (where src is the dataset):

shapes = features.shapes(blue, mask=mask, transform=src.transform)

I don't think I see the name of your dataset variable in your post, so I can't suggest the exact code you need.

Related Solutions

Python – How to Convert Polygons in Shapefile to a NumPy Array

There is an option in GDAL to rasterize polygons based on their attribute. But as far as I know it can not be string. But you can just add an attribute to your features and then give each feature a unique id. Let's say we call this field ID.

Open your shapefile

source_ds = ogr.Open("Longhurst_world_v4_2010.shp")
source_layer = source_ds.GetLayer()

Create the destination raster data source

pixelWidth = pixelHeight = 1 # depending how fine you want your raster
x_min, x_max, y_min, y_max = source_layer.GetExtent()
cols = int((x_max - x_min) / pixelHeight)
rows = int((y_max - y_min) / pixelWidth)
target_ds = gdal.GetDriverByName('GTiff').Create('temp.tif', cols, rows, 1, gdal.GDT_Byte) 
target_ds.SetGeoTransform((x_min, pixelWidth, 0, y_min, 0, pixelHeight))
band = target_ds.GetRasterBand(1)
NoData_value = 255
band.SetNoDataValue(NoData_value)
band.FlushCache()

Here is the important part. Instead of setting a general burn_value, use optionsand set it to the attribute that contains the relevant unique value ["ATTRIBUTE=ID"]

gdal.RasterizeLayer(target_ds, [1], source_layer, options = ["ATTRIBUTE=ID"])

Add a spatial reference

target_dsSRS = osr.SpatialReference()
target_dsSRS.ImportFromEPSG(4326)
target_ds.SetProjection(target_dsSRS.ExportToWkt())
target_ds = None

Now you have your shapefile as a raster and can read it with gdal.Open('temp.tif').ReadAsArray()

enter image description here

[GIS] Setting CRS to ESRI:102001 using GeoPandas

The +init= syntax is deprecated. So all you need is the ESRI:102001 part. See: https://pyproj4.github.io/pyproj/stable/gotchas.html#init-auth-auth-code-should-be-replaced-with-auth-auth-code

inputGDF.crs = 'esri:102001'

Best Answer

Related Solutions

Python – How to Convert Polygons in Shapefile to a NumPy Array

[GIS] Setting CRS to ESRI:102001 using GeoPandas

Related Question