[GIS] Identifying missing pixels/ pixels with no data value in .tifs

geotiff-tiffnodatapixelpythonraster

I am looking for a way to identify missing pixels in .tifs. For example, in the attached image there is a row and a column of missing pixels across the image.

The no data/ missing pixels after some investigation appear to have the value of 0 and appear across all three bands.

enter image description here

Can anyone recommend a way, via a piece of software/script, that could do this in bulk, say for 400-500 .tifs at a time, rather than having to do it manually/visually for each .tif?

Ideally, I'd like something that could give me a printed output in a .txt which lists all the .tifs that have missing pixels, which I could then investigate visually.

The software I have access to is; FME, ArcMap, QGIS, Python, and excel, however I may be able to get hold of something not on that list.

I have attempted a solution using Python with numpy and gdal (shown below), however I am new to python so my script is probably trash. I've played around with using numpy array and GDAL band statistics and I've got the expected results (both methods identify 0 values when expected), however I'm struggling to find a way to loop through a folder of 400/500 images and have some sort of summary statistics for all the images rather than 400/500 individual results.

import numpy as np
from osgeo import gdal

ds = gdal.Open("single.tif)

print "[ RASTER BAND COUNT ] : ", ds.RasterCount
for band in range( ds.RasterCount ) :
    band += 1
    print "[ BAND ] : ", band
    srcband = ds.GetRasterBand(band)
    if srcband is None:
         continue
    stats = srcband.GetStatistics ( True, True )
    if stats is None:
         continue

    print "[ STATS ] =  Minimum=%.3f, Maximum=%.3f" % (
                stats[0], stats[1])
myArray = np.array(ds.GetRasterBand(3).ReadAsArray())
print myArray

np.savetxt("Output.csv")

if 0 in myArray:
    print "yes"
else:
    print "no"

Best Answer

I don't have a lot of aerial imagery to test this on just now, so I'm not 100% sure it'll work in your case. But it should get you closer to what you want.

I've used the rasterio library, which provides a very nice wrapper around GDAL and makes the code much simpler to write (and read).

This combines several things:-

  • a generator function which passes back the name of each TIFF file in a folder (and any sub-folders).
  • it opens each band in each file to generate a list of booleans, which represents which bands have NODATA values
  • if all bands have NODATA values, write to the console

Code :-

import os
import rasterio

def find_images(path):
    for root, dirs, files in os.walk(path, topdown=False):
        for name in files:
            if os.path.splitext(name)[-1] == ".tiff":
                yield os.path.join(root, name)

def main():
    for file_name in find_images("/path/to/your/images"): # <--- change this!
        with rasterio.open(file_name, "r", driver="GTiff") as source:
            no_data = source.nodata
            bands = source.read()
            got_nulls = [no_data in band for band in bands]
            if False not in got_nulls: # we want list elements to all be True
                print("File {} NodataValue {}, has {} Bands, Got Nulls {}".format(file_name, no_data, len(bands), got_nulls))

if __name__ == "__main__":
    main()

I've tried this on Python 2.7.12 using rasterio 0.26 and numpy 1.8.2.

As with most python experiments, I recommend using a python virtual environment to avoid having to install anything system-wide :)

Related Question