[GIS] HDF5 and georeferencing

coordinate systemgdalgeoreferencinghdf5python

I'm doing quite heavy processing of Earth Observation data, and it appears that a lot of my workflow can be speeded up by using HDF5 for local storage. I mostly work with large stacks of raster files, and HDF5 as well as numpexpr allow me to very efficiently process large amounts of data. I would like to have as my external format HDF5, but my data can change geographic projection and geolocation. Ideally, I'd like GDAL to be able to use that information (not too worried about other tools), but I can't seem to find how to define that for these files.

Best Answer

I am currently in the same position as you. HDF5 is a great format, but you need to define and characterize it yourself. Python has a very handy module called h5py, which is quite straight-forward. It is dependent on numpy and the documentation shows you some examples how to query and store your arrays. HDF5 also supports heterogeneous datasets, thus it is all about how you define your container and dimension structure. At the last SciPy 2015 conference there was a very convincing talk about the future of HDF5 and Python.

You might also consider netCDF4, which uses hdf5 as backbone as well and has somewhat more support from the Earth-observation community. IMHO defining dimensions (e.g. x-axis for longitude, y-axis is latitude and so on) is actually easier with the netCDF syntax.

Related Question