[GIS] gdal/python: extracting projection info from hdf file

gdalhdfpython

I'm trying to do something fairly simple. Take two subdatasets from an hdf file, read them as an array, do some math and write the new array back to a geotiff, using the same projection as the input files.

Because I'll be repeating this process for a list of .hdf files where the projection varies, I cannot hardcode the projection string for my outputs. While the projection varies from hdf file to file, it is the same for all subdatasets in a particular hdf file. So I only need to get the proj from one of the subdatasets. I thought it would be fairly easy to store the projection info into a variable, but the normal routines I use aren't working on the hdf file.

Here are few different things I've tried:

ds = open(hdfFile)
sds_b3 = ds.GetSubDatasets()[4][0]
sds_b4 = ds.GetSubDatasets()[5][0]

# this gives me an empty string:
proj = ds.GetProjection()

# both of these give me "AttributeError: 'str' object has no attribute 'GetProjection":
proj = ds.GetSubDatasets()[4][0].GetProjection()
proj = ds.GetSubDatasets()[4][1].GetProjection()

# this last one works in that it prints the correct projection info, in addition 
# to a bunch of other metadata, but I don't know how to extract just the projection
# and store it into a variable:
os.system('gdalinfo %s' % sds_b3)

How can I store the projection information in a variable, like 'proj = dataset.GetProjection()' normally would, so that I can use it when writing the new array to a geotiff? ie:

output = gdal.GetDriverByName("GTiff").Create(outputname, nlines, nrows)
output.SetProjection(proj)

I either need to know the correct way to use GetProjection() when using hdf's or how to isolate the projection part of gdalinfo as a variable.

Best Answer

This is fairly straightforward if you think of the HDF dataset as a container, where each subdataset is a raster image with its own projection.

Your error is in not opening the subdataset, as GetSubDatasets only returns the strings you need to access them.

# open the HDF container
hdf_ds = gdal.Open(hdfFile)

# this is just a string of the name of the subdataset
b3_string = hdf_ds.GetSubDatasets()[4][0]

# open the subdataset
sds_b3 = gdal.Open(hdf_ds.GetSubDatasets()[4][0])

# get the projection
proj = sds_b3.GetProjection()