[GIS] Create pandas DataFrame from raster image – one row per pixel with bands as columns

digital image processingnumpypandaspythonrasterio

I have a raster image with 3 bands. I would like to convert this image to a csv file where each row will be one pixel and each column will be one band, so that I can easily see the three values each pixel got.

This is how I have tried to do it:

import rasterio
import rasterio.features
import rasterio.warp
from matplotlib import pyplot
from rasterio.plot import show
import pandas as pd
import numpy as np


img=rasterio.open("01032020.tif")
show(img,0)

#read image 
array=img.read()

#create np array
array=np.array(array)

#create pandas df

dataset = pd.DataFrame({'Column1': [array[0]], 'Column2': [array[1]],'Column3': [array[2]]})
dataset

and also like this:

dataset = pd.DataFrame({'Column1': [array[0,:,:]], 'Column2': [array[1,:,:]],'Column3': [array[2:,:]]})

but i'm getting something weird like this table:

I have also tried:

index = [i for i in range(0, len(array[0]))]
dataset = pd.DataFrame({'Column1': array[0], 'Column2': array[1],'Column3': array[2]},index=index)
dataset

but then I get the number of the rows I have and it's still not good:

what do I do wrong?

My goal

Get one pandas table, where each row is a pixel, and it should have 3 columns, one for each band.

Best Answer

Quick solution

pd.DataFrame(array.reshape([3,-1]).T)

Explanation

Take array of shape (3, x, y) and flatten out the 2nd and 3rd dimension. From the numpy docs: One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

reshaped_array = array.reshape([3,-1])

Transpose array to get array of shape (x*y, 3)

transposed_array = reshaped_array.T

Build DataFrame

pd.DataFrame(transposed_array)

Related Solutions

[GIS] Image processing using Python, GDAL and Scikit-Image

Firstly, welcome to the site!

Numpy arrays don't have a concept of coordinate systems inbuilt into the array. For a 2D raster they are indexed by column and row.

Note I'm making the assumption that you're reading a raster format that is supported by GDAL.

In Python the best way to import spatial raster data is with the rasterio package. The raw data imported by rasterio is still a numpy array without access to coordinate systems, but rasterio also gives you access to an affine method on the source array which you can use to transform raster columns and rows to projected coordinates. For example:

import rasterio

# The best way to open a raster with rasterio is through the context manager
# so that it closes automatically

with rasterio.open(path_to_raster) as source:

    data = source.read(1) # Read raster band 1 as a numpy array
    affine = source.affine

# ... do some work with scikit-image and get an array of local maxima locations
# e.g.
# maxima = numpy.array([[0, 0], [1, 1], [2, 2]])
# Also note that convention in a numy array for a 2d array is rows (y), columns (x)

for point in maxima: #Loop over each pair of coordinates
    column = point[1]
    row = point[0]
    x, y = affine * (column, row)
    print x, y

# Or you can do it all at once:

columns = maxima[:, 1]
rows = maxima[:, 0]

xs, ys = affine * (columns, rows)

And from there you can write your results our to a text file however you like (I'd suggest taking a look at the inbuilt csv module for example).

[GIS] Making shapefile from Pandas dataframe

Yes, that can be done with shapely and geopandas.

Supposed that your pandas dataframe kind of looks like this:

import pandas as pd
data = [
        {'some_attribute': 'abc', 'lat': '50.1234', 'lon': '10.4023'},
        {'some_attribute': 'def', 'lat': '40.5678', 'lon': '8.3365'},
        {'some_attribute': 'ghi', 'lat': '60.9012', 'lon': '6.2541'},
        {'some_attribute': 'jkl', 'lat': '45.3456', 'lon': '12.5478'},
        {'some_attribute': 'mno', 'lat': '35.7890', 'lon': '14.3957'},
        ]

df = pd.DataFrame(data)
print(df)

=>

       lat      lon some_attribute
0  50.1234  10.4023            abc
1  40.5678   8.3365            def
2  60.9012   6.2541            ghi
3  45.3456  12.5478            jkl
4  35.7890  14.3957            mno

First, make sure that geopandas and shapely are installed properly which sometimes is not easy because they come with some dependencies (e.g. GEOS and GDAL). If does not work at first try via pip install geopandas shapely, search for the error on Google or StackOverflow/Gis.Stackexchange because most probably there will be an answer available solving that problem for you.

Then, it is just a matter of creating a new geometry column in your dataframe which combines the lat and lon values into a shapely Point() object. Note that the Point() constructor expects a tuple of float values, so conversion must be included if the dataframe's column dtypes are not already set to float.

from shapely.geometry import Point

# combine lat and lon column to a shapely Point() object
df['geometry'] = df.apply(lambda x: Point((float(x.lon), float(x.lat))), axis=1)

Now, convert the pandas DataFrame into a GeoDataFrame. The geopandas constructor expects a geometry column which can consist of shapely geometry objects, so the column we created is just fine:

import geopandas
df = geopandas.GeoDataFrame(df, geometry='geometry')

To dump this GeoDataFrame into a shapefile, use geopandas' to_file() method (other drivers supported by Fiona such as GeoJSON should also work):

df.to_file('MyGeometries.shp', driver='ESRI Shapefile')

And that is what the resulting shapefile looks like when visualized with QGIS: