[GIS] Create pandas DataFrame from raster image – one row per pixel with bands as columns

digital image processingnumpypandaspythonrasterio

I have a raster image with 3 bands. I would like to convert this image to a csv file where each row will be one pixel and each column will be one band, so that I can easily see the three values each pixel got.

This is how I have tried to do it:

import rasterio
import rasterio.features
import rasterio.warp
from matplotlib import pyplot
from rasterio.plot import show
import pandas as pd
import numpy as np


img=rasterio.open("01032020.tif")
show(img,0)

#read image 
array=img.read()

#create np array
array=np.array(array)

#create pandas df

dataset = pd.DataFrame({'Column1': [array[0]], 'Column2': [array[1]],'Column3': [array[2]]})
dataset

and also like this:

dataset = pd.DataFrame({'Column1': [array[0,:,:]], 'Column2': [array[1,:,:]],'Column3': [array[2:,:]]})

but i'm getting something weird like this table:
enter image description here

I have also tried:

index = [i for i in range(0, len(array[0]))]
dataset = pd.DataFrame({'Column1': array[0], 'Column2': array[1],'Column3': array[2]},index=index)
dataset

but then I get the number of the rows I have and it's still not good:
enter image description here

what do I do wrong?

My goal

Get one pandas table, where each row is a pixel, and it should have 3 columns, one for each band.

Best Answer

Quick solution

pd.DataFrame(array.reshape([3,-1]).T)

Explanation

  1. Take array of shape (3, x, y) and flatten out the 2nd and 3rd dimension. From the numpy docs: One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
reshaped_array = array.reshape([3,-1])
  1. Transpose array to get array of shape (x*y, 3)
transposed_array = reshaped_array.T
  1. Build DataFrame
pd.DataFrame(transposed_array)