[GIS] How does QGIS open so large raster datasets (about 40GB)

gdalgeotiff-tiffnumpyqgis

I have problem with GDAL library when open a big GeoTiff file, with size about 32000×32000. I cannot use ReadAsArray function because maximum size of numpy array in python. But I'm wondering why QGIS can open that file easily. What is the technique behind?

Best Answer

If QGIS is runnig in a 1000x1000 pixel sized window on your screen there is no need to read all 32000x32000 pixels for showing the map. GDAL tries to read data from the source image so that no data at all is read outsize the bounding box, and if image has overviews the data come from the resolution level that is best suitable for the map resolution. There is always some overhead but if GDAL needs to read 2000x2000 pixels it is still nothing compared to 32000x32000 pixels worth of data.

It depends on the image format and corresponding driver how well the "read-only-what-you-need" principle works. If you have a geotiff that is internally tiled into 256x256 tiles and that contains the overviews (or pyramid layers or reduced resolutions in other names) GDAL can do it pretty well. On the other hand, large PNG and JPEG images are ineffective because the whole image must be decompressed before it is possible to take data from some small region of interest.

Note: One may know that even huge GeoTIFF files which are compressed with JPEG method are not ineffecctive at all. That's true because in this case the TIFF file is tiled and tiles are compressed with JPEG individually. GDAL does need to decompress each tile completely, but because tiles are small with only 256x256 pixels the operation is small and memory usage low.

Read about blocks, windowing and overviews from http://www.gdal.org/gdal_tutorial.html

Related Solutions

[GIS] Converting GeoTIFF file to numpy array to QPixmap with PyQt4

I managed to do it in the end, so I'll post the answer here in case anyone needs it later.

The problem was that my image contained float32 and sometimes even complex32 data, but QImage does not support it, so I had to convert it to uint8.

Another problem that came up was that my image was completely black, because the values were most of the time like 0.x, so it was considered as black. I tried to use ImageEnhance from the PIL library, but the result wasn't satisfying due to some unusual values that would trouble the results. So I just decided to multiply every value of the array by a factor, determined by the user. Depending on the values you have in your array, it is not unusual to set the factor really high, for example in my case I usually set it at 300.

The code looks like this:

# Import the libraries
from osgeo import gdal
from PIL import Image, ImageQt
from PyQt4 import QtGui
import numpy as np

# Read the dataset and setting up the numpy arrays
dsr = gdal.Open("path/to/file")
np_array = np.array(dsr.ReadAsArray())
np_array_uint8 = (np_array * factor).astype(np.uint8) # 'factor' is a user-determined value

# Convert to image
im8 = Image.fromarray(np_array_uint8)
imQt = QtGui.QImage(ImageQt.ImageQt(im8))
pxmap = QtGui.QPixmap.fromImage(imQt)

# Set the pixmap into a label
self.label = QtGui.QLabel(self.centralwidget)
self.label.setPixmap(QtGui.QPixmap(mr_img)) # You can scale it as well

There you go, in case you ever need it. You can also save memory by setting everything to None if you need.

Hope this helps!

GeoTIFF Header Size – Determining Size of GeoTIFF Header with Python

Check out tifffile, which is a Python package to read and write image data from and to TIFF files.

import tifffile
import numpy as np

fname = 'my.tif'
tif = tifffile.TiffFile(fname)
page = tif.pages[0]  # first page
arr = page.asarray()  # we're done

So far, this is no different than either rasterio or GDAL.

To answer your question, you can get the byte offset and size from the page.is_contiguous property, then read it with regular numpy or any other tool that can read contiguous data:

offset, byte_count = page.is_contiguous
with open(fname, 'rb') as fp:
    fp.seek(offset)
    # if this is a 4-byte float file ...
    arr2 = np.fromfile(fp, dtype=np.float32, count=byte_count / 4)
arr2.shape = arr.shape
assert (arr2 == arr).all()

Best Answer

Related Solutions

[GIS] Converting GeoTIFF file to numpy array to QPixmap with PyQt4

GeoTIFF Header Size – Determining Size of GeoTIFF Header with Python

Related Question