[GIS] Reading NetCDF file of station data

netcdfpython

Please can anyone have experience with netcdf data with irregular grid.
my data file is taken from NCEP ADP Global Upper Air and Surface Weather Observations (PREPBUFR format)

So i need to extract data in a ASCII file like this :

STATION_ID   LAT  LON     OBS_VAL ...
  70585     45.5  60.5    1005.5    .....

the header information is :

[hachelaf@skynet1 test]$ ncdump -h gdas.44512.2010040100.nc
netcdf gdas.44512.2010040100 {
dimensions:
        mxstr = 16 ;
        hdr_arr_len = 3 ;
        obs_arr_len = 5 ;
        nobs = UNLIMITED ; // (98012 currently)
        nhdr = 30766 ;
variables:
        char obs_qty(nobs, mxstr) ;
                obs_qty:long_name = "quality flag" ;
        float obs_arr(nobs, obs_arr_len) ;
                obs_arr:long_name = "array of observation values" ;
                obs_arr:_fill_value = -9999.f ;
                obs_arr:columns = "hdr_id gc lvl hgt ob" ;
                obs_arr:hdr_id_long_name = "index of matching header data" ;
                obs_arr:gc_long_name = "grib code corresponding to the observation type" ;
                obs_arr:lvl_long_name = "pressure level (hPa) or accumulation interval (sec)" ;
                obs_arr:hgt_long_name = "height in meters above sea level (msl)" ;
                obs_arr:ob_long_name = "observation value" ;
        char hdr_typ(nhdr, mxstr) ;
                hdr_typ:long_name = "message type" ;
        char hdr_sid(nhdr, mxstr) ;
                hdr_sid:long_name = "station identification" ;
        char hdr_vld(nhdr, mxstr) ;
                hdr_vld:long_name = "valid time" ;
                hdr_vld:units = "YYYYMMDD_HHMMSS" ;
        float hdr_arr(nhdr, hdr_arr_len) ;
                hdr_arr:long_name = "array of observation station header values" ;
                hdr_arr:_fill_value = -9999.f ;
                hdr_arr:columns = "lat lon elv" ;
                hdr_arr:lat_long_name = "latitude" ;
                hdr_arr:lat_units = "degrees_north" ;
                hdr_arr:lon_long_name = "longitude" ;
                hdr_arr:lon_units = "degrees_east" ;
                hdr_arr:elv_long_name = "elevation" ;
                hdr_arr:elv_units = "meters above sea level (msl)" ;

// global attributes:
                :FileOrigins = "File /glade/p/rda/transfer/dsrqst/RABAH44512/gdas.44512.2010040100.nc generated 20130912_180331 UTC on host geyser01 by the MET pb2nc tool" ;
                :MET_version = "V4.1" ;
                :MET_tool = "pb2nc"

Best Answer

These kind of files are bit of a pain, because they are using one set of variables to index another set of variables. But once you've figured out what is indexing what, you can do it pretty easily using python. Here we use netCDF4 to read the netcdf and Pandas to write the CSV, but you could use other tools to do the job.

from pylab import *
import netCDF4
import pandas as pd

nc = netCDF4.Dataset('gdas.44512.2010040100.nc')

obs_arr = nc.variables['obs_arr'][:,:]
hdr_typ = netCDF4.chartostring(nc.variables['hdr_typ'][:,:])
hdr_sid = netCDF4.chartostring(nc.variables['hdr_sid'][:,:])
hdr_vld = netCDF4.chartostring(nc.variables['hdr_vld'][:,:])
obs_qty = netCDF4.chartostring(nc.variables['obs_qty'][:,:])
hdr_arr = nc.variables['hdr_arr'][:,:]

# the station index
id=int8(obs_arr[:,0])

# make a dictionary with each column of information
d={}
d['station_id']  = hdr_sid[id]
d['lon'] = hdr_arr[id,0]
d['lat'] = hdr_arr[id,1]
d['elev'] = hdr_arr[id,2]
d['time'] = hdr_vld[id]
d['grib_code'] = obs_arr[:,1]
d['press'] = obs_arr[:,2]
d['height'] = obs_arr[:,3]
d['obs_val'] = obs_arr[:,4]
d['quality'] = obs_qty[:]

# convert dictionary to dataframe
df = pd.DataFrame(d)

# take a look at the first record
print df.ix[0]

which produces the output:

elev                       15
grib_code                  11
height               14.99732
lat                    -69.96
lon                     41.66
obs_val                284.75
press                    1008
quality                     2
station_id              74494
time          20100401_000000
Name: 0, dtype: object

so things are looking good. The last step is then just to write the CSV file

df.to_csv('gdas.csv')

Pandas is kind of overkill if all you want to do is write out CSV, but if you wanted to explore the data a bit, it is really handy. Plus you should have Pandas anyway. ;-)

Update: one more thing. If you wanted to split the time column into year, month, day, hour (for output to CSW), you could do it this way:

df['year'] = df['time'].str[0:4]
df['mon'] = df['time'].str[4:6]
df['day'] = df['time'].str[6:8]
df['hour'] = df['time'].str[9:11]
del df['time']
Related Question