Python – How to Convert Polygons in Shapefile to a NumPy Array

gdalimportnumpypythonraster

I have a shapefile containing a number of polygons and all attributes being text. I'm trying to import the polygons into NumPy as an array where each polygon is represented as unique values.

I approach this by using gdal_rasterize to generate a GeoTIFF, which I then can convert to an array:

gdal_rasterize -a provcode -l longhurst PG:'host=localhost dbname=biomes' -tr 0.25 0.25 out.tif

and

tif = gdal.Open('out.tif')
tifArray = tif.ReadAsArray()

My problem is that all polygons get the same values since provcode is of type string. How can I make gdal_rasterize burn different values to represent different polygons?

Alternatively, is there a better way to do this conversion?

Best Answer

There is an option in GDAL to rasterize polygons based on their attribute. But as far as I know it can not be string. But you can just add an attribute to your features and then give each feature a unique id. Let's say we call this field ID.

Open your shapefile

source_ds = ogr.Open("Longhurst_world_v4_2010.shp")
source_layer = source_ds.GetLayer()

Create the destination raster data source

pixelWidth = pixelHeight = 1 # depending how fine you want your raster
x_min, x_max, y_min, y_max = source_layer.GetExtent()
cols = int((x_max - x_min) / pixelHeight)
rows = int((y_max - y_min) / pixelWidth)
target_ds = gdal.GetDriverByName('GTiff').Create('temp.tif', cols, rows, 1, gdal.GDT_Byte) 
target_ds.SetGeoTransform((x_min, pixelWidth, 0, y_min, 0, pixelHeight))
band = target_ds.GetRasterBand(1)
NoData_value = 255
band.SetNoDataValue(NoData_value)
band.FlushCache()

Here is the important part. Instead of setting a general burn_value, use optionsand set it to the attribute that contains the relevant unique value ["ATTRIBUTE=ID"]

gdal.RasterizeLayer(target_ds, [1], source_layer, options = ["ATTRIBUTE=ID"])

Add a spatial reference

target_dsSRS = osr.SpatialReference()
target_dsSRS.ImportFromEPSG(4326)
target_ds.SetProjection(target_dsSRS.ExportToWkt())
target_ds = None

Now you have your shapefile as a raster and can read it with gdal.Open('temp.tif').ReadAsArray()

enter image description here

Related Solutions

[GIS] Selecting raster values at random locations in a raster in python/arcgis

If I understand you correctly I think you can solve it like this (their are comments in the code that explain what is going on):

import numpy, arcpy, random

#Establish the extent which your random samples can be within
rangeX = (100, 2500000) # Enter the actual range in x values of your rasters * 100 in order to get coordinates with decimals
rangeY = (100, 2500000) # Enter the actual range in y values of your rasters * 100 in order to get coordinates with decimals
qty = 1000  # Enter in the number greater than random points you need


#Generate random x,y coordinates
randPoints = []
while len(randPoints) < qty:
    x = random.randrange(*rangeX)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    y = random.randrange(*rangeY)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
    randPoints.append((x,y))

#Create dictionary of key and lists, list will house tuples of (x,y,z)
#Enter in actual classified values for dictionary keys
valueDict = {'Class1' : [],
             'Class2' : [],
             'Class3' : [],
             'Class4' : []}

######Get Rasters bands as well as cell height, width, origin info to be able to get
######index of x,y location in the numpy array
arcpy.env.workspace = inPath + '\\aster.img'
bands = arcpy.ListRasters()
Ras = arcpy.Raster(inPath + '\\aster.img')
originX = Ras.extent.upperLeft.X
originY = Ras.extent.upperLeft.Y
pixelWidth = Ras.meanCellWidth
pixelHeight = Ras.meanCellHeight

#Create a list that houses each raster array
bandsList = []
for i in bands:
    bandsList.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))

#loop over all of the random point locations and collect raster values at their
#locations if the dictionary entry for that value is not full populate it
#with a tuple of (x,y,z), keep going until each class is full
for i in randPoints:
    X = i[0]
    Y = i[1]
    xOffset = int((X-originX)/pixelWidth)
    yOffset = int(abs(Y-originY)/pixelHeight)
    for j in range(0,len(bands)):
       sampleValue = bandsList[j][yOffset, xOffset]
       for key in valueDict.keys():
           if sampleValue == key:
               if len(valueDict[key]) < 10:
                   valueDict[key].append((X, Y, sampleValue))
                   break
               else:
                   continue

This is a variation of a script that I have used to extract raster values at random x,y locations, so it may need some tweaking but I think the major elements are their to get the job done for you.

[GIS] How to identify which lines/polygons intersect when using raster data in R

I made an answer with your toy lines but not with gdal (easy enough to change if you like). I decided to make a blank raster, presumably of which you know the extent? and then populate the table using extract:

# make the blank raster, the extents and resolution of which are up to you.
# the cells are numbered 1 to ncell (starting in the top left in R)
r <- raster(xmn=0,xmx=4,ymn=0,ymx=4,res=c(2,2))
r[] <- 1:ncell(r)

# make your toy.lines as above, except i've thrown in a 3rd to be daring
# why do you have length?
df <- rbind(c(1,1), c(4,4),c(3,1), c(3,4))

coord_1 <- Line(rbind(c(1,1), c(4,4)))
coord_2 <- Line(rbind(c(3,1), c(3,4)))
coord_3 <- Line(rbind(c(0,1), c(4,1)))

l1 <- Lines(list(coord_1), ID = "road1")
l2 <- Lines(list(coord_2), ID = "road2")
l3 <- Lines(list(coord_3), ID = "road3")

Sl <- SpatialLines(list(l1, l2, l3))

df <- data.frame(len = sapply(1:length(Sl), function(i) gLength(Sl[i, ])))
rownames(df) <- sapply(1:length(Sl), function(i) Sl@lines[[i]]@ID)

toy.lines <- SpatialLinesDataFrame(Sl, data = df)
toy.lines@data$polygon_id <- 1:nrow(toy.lines@data)

# use the extract function which will return a list, one element for every spatial line, 
# saying which cells it passes through. Named elements by id.
l <- extract(r,toy.lines)
names(l) <- toy.lines@data$polygon_id

# make a data table of cell ID and what line it touches, unlisting the list elements 
# and turning them into a long vector (and inserting that as a column)
cells.df <- data.table(cell=unlist(l),lineid=rep(as.numeric(names(l)),lapply(l, function(x) length(x))))
setkey(cells.df,cell)

# make a simple data table of all cell numbers (will enable us to keep NA cells)
all.cells <- data.table(cell=1:ncell(r))
setkey(all.cells,cell)

# merge by key. Data table orders by key and autopopulates NAs
final.df <- cells.df[all.cells]

That produces table B anyway. Unsure as to why it believes line 1 to pass through cell 4, but there you go.

Best Answer

Related Solutions

[GIS] Selecting raster values at random locations in a raster in python/arcgis

[GIS] How to identify which lines/polygons intersect when using raster data in R

Related Question