[GIS] Open HDF4 Files using GDAL on Windows

gdalhdfmodisrwindows

I've got what seems to be a common problem regarding HDF files, but after spending a couple days going around in circles with R, GDAL, etc., I'm turning here for (hopefully) some insight.

My goal is to make a GeoTiff raster which is a mosaic of a bunch of different HDF input files. I started by trying to generally follow the workflow described on this tutorial, which is:

  1. Convert HDF files to GeoTiffs and reproject them.
  2. Mosaic reprojected GeoTiffs.

However, I am having significant difficulties with Step 1, specifically that I can't figure out how to open/work with the HDF files. The raw input datafiles are estimates of Leaf Area Index from the GLASS LAI dataset, derived from MODIS satellite data, and are described here and available for download here.

Following the tutorial linked above, I have installed the gdalUtils package for R, and attempted to get some information about the files, but this returned a somewhat lengthy error:

require(gdalUtils)

gdalinfo("C:/.../001/GLASS01A01.V03.A2010001.h17v03.2012253.hdf")

[1] "ERROR 4: `C:/.../001/GLASS01A01.V03.A2010001.h17v03.2012253.hdf' not recognized as a supported file format."
[2] ""                                                                                                                                                         
[3] "gdalinfo failed - unable to open 'C:/.../001/GLASS01A01.V03.A2010001.h17v03.2012253.hdf'."                  
attr(,"status")
[1] 1
Warning message:
running command '"C:\Program Files\QGIS 2.16\bin\gdalinfo.exe" "C:/.../001/GLASS01A01.V03.A2010001.h17v03.2012253.hdf"' had status 1 

After some research, I began to suspect that the not recognized as a supported file format error was occurring because the file is an HDF4 file, not an HDF5 file. Based on this previous question, I attempted to select an installation that works with HDF4 files, but this also returned an error:

gdal_chooseInstallation(hasDrivers=c("HDF4","HDF5"))

Error in gdal_chooseInstallation(hasDrivers = c("HDF4", "HDF5")) : 
  No installations match.
In addition: Warning message:
In grep(paste(hasDrivers, "-", sep = ""), installation_drivers$format_code) :
  argument 'pattern' has length > 1 and only the first element will be used

To remedy this, I attempted to install a new version of GDAL (gdal-201-1800-x64-ecw-33.msi) following the tutorial here. However, the same errors as above occurred when using R.

When I attempt to do things directly from the Windows command line, I can verify that gdal is installed and knows about the hdf4 format:

> C:\Users\Sam>gdalinfo --format hdf4 
Format Details:   
Short Name: HDF4
Long Name: Hierarchical Data Format Release 4   
Supports: Raster  
Extension: hdf   
Help Topic: frmt_hdf4.html   
Supports: Subdatasets  
Supports: Open() - Open existing dataset.

However, when I try to run gdalinfo via the command line, I get a comparable error:

C:\Users\Sam>gdalinfo C:\...\001\GLASS01A01.V03.A2010001.h17v03.2012253.hdf
ERROR 4: `C:\...\001\GLASS01A01.V03.A2010001.h17v03.2012253.hdf' not recognized as a supported file format.

gdalinfo failed - unable to open 'C:\...\001\GLASS01A01.V03.A2010001.h17v03.2012253.hdf'.

At this point, I am not sure whether my file is actually HDF4 and I'm just doing something wrong, or there is an issue with the file itself, or a problem with my GDAL installation, or some other option I'm not aware of.

I am running 64-bit Windows 10 and have the 2013 (12.0.30501) version of C++ installed.

UPDATE

The problem seems to be associated with the manner in which I downloaded the files. I used a loop in R to acquire several hundred of these images with the download.file command. However, if I manually go to the FTP website, right click on a file name, and do Save as... to save it to my computer, gdalinfo works correctly.

Note that the files are also slightly different sizes. The one downloaded using R is 374 kb, and the one downloaded by hand is 373 kb. The permissions of the two files appears to be the same. Is there a way to use download.file that replicates the Save as... functionality? I've also tried re-downloading all the .hdf files in R, with and without their .xml accompaniments, with the same issue. I need to download several hundred files and would prefer not to do it by hand!

UPDATE 2

Selecting a different method flag in download.file fixed the problem. Previously I was using the default method, which for Windows turns out to be wininet. When I set the method="curl" flag in my call to download.file, GDAL worked correctly on the newly downloaded files. I have added an answer below explaining this.

Best Answer

Setting your mode to binary in your download.file() call would have worked as well. You should always specify this with download.file([...] , mode='wb') if you're downloading binary files. Otherwise the download itself is not in hdf format even if the file extension makes it look like it is. This leads to the error you received above. This also will help people who don't have wget or curl installed on their machines.

The reason curl works in your solution is because by default it treats the output (your file to download) exactly as the input, while the default method used does not default to binary transfers, becoming an issue when downloading binary files.

Related Question