[GIS] Cannot properly import MODIS MOD07_L2 .hdf files into R using rgdal

modisrrasterrgdal

I've been trying to accurately import and process some .hdf files from the MODIS Atmospheric Profile product (MOD07_L2) in R for several days now. Still, there's something going wrong during data import. The below code can be reproduced using one exemplary .hdf file (MOD07_L2.A2013001.0835.006.2013001192145.hdf) that can be downloaded from Dropbox.

library(rgdal)

# Extraction of metadata via `GDALinfo`    
filename <- "MOD07_L2.A2013001.0835.006.2013001192145.hdf"
gdalinfo <- GDALinfo(filename, returnScaleOffset = FALSE)
metadata <- attr(gdalinfo, "subdsmdata")

# Extraction of SDS string for parameter 'Skin_Temperature' (formerly 'Surface_Temperature')    
sds <- metadata[grep("Skin_Temperature", metadata)[1]]
sds <- sapply(strsplit(sds, "="), "[[", 2)

# Raster import via `readGDAL`   
sds.rg <- readGDAL(sds)

So far, so good, but here comes the confusing part:

> summary(sds.rg$band1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 -14870  -14850  -14850  -14840  -14840  -14820   53529 

Considering the fact that Skin_Temperature has an official valid range from 150 to 350 K (see MOD07_L2:Format & Content), the mean would inherit a value of

> (-14840 - (-15000)) * 0.01
[1] 1.6

after considering the corresponding add_offset (-15000) and scale_factor (0.01). Note that we're still talking about Kelvin, not °C. Extracting SDS No. 8, i.e. Skin_Temperature, using

library(gdalUtils)
gdal_translate(filename, dst_dataset = "tmp.tif", sd_index = 8)

and opening the resulting file called "tmp.tif" in QGIS resulted in seemingly reliable values centered around 15000, i.e. roughly 27 °C. However, importing "tmp.tif" back into R using raster again resulted in values comparable to the ones shown above:

> summary(raster("tmp.tif"))
              tmp
Min.    -14867.31
1st Qu. -14848.13
Median  -14845.89
3rd Qu. -14840.53
Max.    -14819.93
NA's         0.00

I've been searching the internet and stumbled across similar problems related to rgdal. However, when I tried to cast toUnSigned on band 1 of my previously generated 'SpatialGridDataFrame', I received the following error message:

> toUnSigned(sds.rg$band1, 16)
Error in toUnSigned(sds.rg$band1, 16) : band not integer

Apparently, the data imported into R is not even of type integer (what it is supposed to be), but numeric:

> sds.rg$band1[1:5]
[1]        NA        NA -14839.40 -14840.25 -14839.26

Is there an apparent mistake in my code, or is there any point I miss when importing the .hdf and .tif files using rgdal? I would be extremely grateful for any kind of help.

Best Answer

The answer is surprisingly simple.

sgr_lst <- readGDAL(sds, as.is = TRUE)

solves the issue. The only thing left to do then is transform the resulting SpatialGridDataFrame to a proper Raster* object using raster(). For this purpose, it is necessary to retrieve the west, east, south, and north bounding coordinates (see ?extent: xmin, xmax, ymin, ymax) from the metadata e.g. via

meta <- attr(gdalinfo, "mdata")

## string patterns
crd_str <- paste0(c("WEST", "EAST", "SOUTH", "NORTH"), "BOUNDINGCOORDINATE")
## search for patterns in metadata
crd_id <- sapply(crd_str, function(i) grep(i, meta))
## extract information
crd <- meta[crd_id]
## create 'extent' object
crd <- as.numeric(sapply(strsplit(crd, "="), "[[", 2))
ext <- extent(crd)

The thus determined extent (together with the EPSG code) can then be passed on to the finally created RasterLayer which, after applying the scale factor and offset, definitely looks fine now.

## rasterize, apply offset and scale factor
rst_lst <- (raster(sgr_lst) - -1.5e+04) * 1.0e-02
## set extent and coordinate reference system
extent(rst_lst) <- crd
projection(rst_lst) <- "+init=epsg:4326"

lst