[GIS] Recommendation for getting TIF raster from modis HDF, for processing in R

coordinate systemmodismodis-reprojection-toolrraster

I've downloaded a set of modis mod13q1 HDF files, e.g. MOD13Q1.A2016097.h11v05.006.2016114040325.hdf.

I'd like to load the 250m_16_days_EVI and 250m_16_days_pixel_reliability bands into R as rasters. My understanding is that I first need to use the modis reprojection tool (MRT) to create TIF files from the HDF. [Incorrect! The HDF file can be loaded directly into R; see answer and code below.] I'm using 64-bit linux and I have the MRT installed in /home/my_username/mrt/bin/.

MRT gives several options for the output projection type. I know very little about projections. What parameters should I choose if my objectives are

  1. I'd like the conversion to run quickly
  2. I want to lose as little information as possible, i.e. I want to keep the 250 meter resolution of the original HDF file
  3. I need to be able to load the rasters into R

The ultimate goal is to extract raster values at a few thousand (say, 10,000) spatial points.

Edit: additional information, in case it's helpful:

  • I'm getting my HDF data from https://search.earthdata.nasa.gov/search. As far as I can tell there is no option to download TIF files, or any other format that can be loaded directly into R
  • I have two sets of HDF files for which I'd like to run this procedure. The first are in the USA (lower 48), and the second set covers Brazil.

Here's a simple example that seems to work. This borrows from Cannot properly import MODIS MOD07_L2 .hdf files into R using rgdal and mdsumner's answer. It skips the MRT step entirely.

library(raster)
library(rgdal)

hdf_path <- "/home/my_username/my_modis_dir"
hdf_files <- list.files(hdf_path, pattern="^MOD13Q1.*\\.hdf$")
stopifnot(length(hdf_files) > 0)

infile <- hdf_files[1]  # MOD13Q1.A2000049.h11v04.006.2015136104609.hdf
## Modis h11v04 includes Wisconsin, parts of Michigan, parts of Illinois, ...

gdal_info <- GDALinfo(infile, returnScaleOffset=FALSE)  # Two warnings
## Two warnings: 1: In dim(x) : no bands in dataset; 2: In GDALinfo GeoTransform values not available

subdataset_metadata <- attr(gdal_info, "subdsmdata")
length(subdataset_metadata)  # 24 strings starting with SUBDATASET_1_NAME=, SUBDATASET_1_DESC=, ...

## Extract substring following SUBDATASET_2_NAME=
evi_subdataset <- subdataset_metadata[grep("250m 16 days EVI", subdataset_metadata)[1]]
evi_subdataset_name <- sapply(strsplit(evi_subdataset, "="), "[[", 2)  # "HDF4_EOS:EOS_GRID: ... "

spatial_grid_df <- readGDAL(evi_subdataset_name, as.is=TRUE)  # Careful, need as.is=TRUE
## readGDAL call prints "... has GDAL driver HDF4Image and has 4800 rows and 4800 columns"

str(spatial_grid_df)
## Formal class 'SpatialGridDataFrame' [package "sp"] with 4 slots
## @ data : 'data.frame':   23040000 obs. of  1 variable
## @ grid : Formal class 'GridTopology' [package "sp"] with 3 slots
## @ bbox       : num [1:2, 1:2] -7783654 4447802 -6671703 5559753
## @ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   projargs: chr "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs"

summary(spatial_grid_df@data)
## band1 Min: -2000 Mean: 1249 Max: 9675 NAs:374180, plausible values for EVI
## See https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mod13q1_v006

evi_raster <- raster(spatial_grid_df)
identical(crs(evi_raster), crs(spatial_grid_df))  # True
plot(evi_raster)  # Extent [-7783654, -6671703] for x, [4447802, 5559753] for y

## Sanity check: extract EVI values for points in Lake Michigan
n_points <- 1000
lat <- seq(42.5, 44, length.out=n_points)
lon <- -runif(n_points, min=86.6, max=87.4)
wgs84 <- "+init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
points_in_lake_michigan <- SpatialPoints(coords=cbind(lon, lat), proj4string=CRS(wgs84))
evi_in_lake_michigan <- extract(evi_raster, spTransform(points_in_lake_michigan, crs(evi_raster)))
summary(evi_in_lake_michigan)  # 21 NAs, otherwise median around -200, plausible for water

## Points in Wisconsin
lat <- seq(44, 45, length.out=n_points)
lon <- -runif(n_points, min=89, max=90)
points_in_wisconsin <- SpatialPoints(coords=cbind(lon, lat), proj4string=CRS(wgs84))
evi_in_wisconsin <- extract(evi_raster, points_in_wisconsin)
summary(evi_in_wisconsin)  # Zero NAs, median around 1750

plot(evi_raster)
points(spTransform(points_in_lake_michigan, crs(evi_raster)), col="red")
points(spTransform(points_in_wisconsin, crs(evi_raster)), col="blue", pch=2)
## Plot above looks correct (but gets messed up if I resize the window)

usa <- getData("GADM",country="USA",level=1)
plot(subset(usa, NAME_1 %in% c("Wisconsin", "Illinois", "Indiana", "Michigan")), border="black")
points(points_in_lake_michigan, col="red")
points(points_in_wisconsin, col="blue", pch=2)  # Looks correct

evi_raster_with_points
state_borders_with_points

Best Answer

I would just read directly from the source HDFs with raster, you'll need to make sure you have the right HDF libraries installed when GDAL is installed / built. What Linux are you on? On Ubuntu with apt-get I use (something like) this install script:

https://github.com/mdsumner/nectar/blob/master/r-spatial.sh

That has fuller notes about RStudio and other libraries, so you'll have to interpret it for your needs.

Note that it really matters whether this is HDF4 or HDF5 - I'm assuming HDF4 but HDF5 is similar - you can check what's available with gdalinfo --formats at the command line.

It's also not so hard to install from source, but then you'll need library paths set and so on. The best guidelines I've seen are here: http://scigeo.org/articles/howto-install-latest-geospatial-software-on-linux.html

Often when you read an HDF4 via GDAL it doesn't come in fully georeferenced, but that's easy to fix if your grids are regular and predicbable. If you are looking at non-regular grids raster is still really powerful, but you have to treat the georeferencing as "index space" and deal with the x/y coordinates as values in separate "bands" or subdatasets.

Related Question