[GIS] Successfully extracting values from a GRIB2 file…is it possible

grib-2javanetcdfpython

Let me say first that I'm starting to work with GRIB2 files, I've read the docs, and more or less know how they are structured. My goal is to be able to parse some GRIB2 files from NOAA, more precisely, those from the WaveWatch III model. So please forgive my ignorance if some of what I say sounds too naive 😀

I'm not attached to any technology, although I feel comfortable with Java and Python. To be more precise, you can find the GRIB2 file I'm using as a sample at this link.

I'm also using the ToolsUI from Unidata for a more "user friendly" inspection of the GRIB2 file. As you can see, there are some variables in the provided GRIB2, but I'm only interested in 2 of them:

Wind_direction_from_which_blowing_surface
Direction_of_swell_waves_ordered_sequence_of_data

So far, I've tried this code, which is based on this question also, and I'm able to open the GRIB2 file:

Grib2Input grib2Input;
String pathfile = "glw.grl.WDIR.grb2";

//Create RandomAccessFile
ucar.unidata.io.RandomAccessFile gribfile = null;
    try {
    gribfile = new ucar.unidata.io.RandomAccessFile(pathfile, "rw");
} 
catch (IOException e) {
    e.printStackTrace();
}

//Create grib2Input file
grib2Input = new Grib2Input(gribfile);
Grib2Data gd = new Grib2Data(gribfile);
try {
    grib2Input.scan(false,  false);
    List<Grib2Record> records = grib2Input.getRecords();
    float[] data = null;
    for (int i = 0; i < records.size(); i++){
        //First get record information   
    System.out.println("record " + i +  "-> Param. Categ.: " +   records.get(i).getPDS().getPdsVars().getParameterCategory());
    System.out.println("record " + i +  "-> Param. Number: " + records.get(i).getPDS().getPdsVars().getParameterNumber());
    try {
    data = gd.getData(records.get(i).getGdsOffset(),records.get(i).getPdsOffset(), records.get(i).getId().getRefTime());
    } 
catch (Exception e) {
    e.printStackTrace();
}
//Second print some data
for (int j = 0; j < 50000; j+=5000){
    System.out.println("data[" + j + "] -> " + data[j]);
}

System.out.println("--");
}

}
catch (IOException e) {
e.printStackTrace();
}

Now the output, for the first 2 records, is this:

record 0-> Param. Categ.: 2
record 0-> Param. Number: 0
data[0] -> NaN
data[5000] -> 64.34
data[10000] -> NaN
data[15000] -> NaN
data[20000] -> 38.13
data[25000] -> NaN
data[30000] -> NaN
data[35000] -> NaN
data[40000] -> 66.55
data[45000] -> 107.06
--
record 1-> Param. Categ.: 0
record 1-> Param. Number: 7
data[0] -> NaN
data[5000] -> NaN
data[10000] -> NaN
data[15000] -> NaN
data[20000] -> NaN
data[25000] -> NaN
data[30000] -> NaN
data[35000] -> NaN
data[40000] -> NaN
data[45000] -> 328.5

Well, here are my question:

1.- Is there a better way to do this with NetCDF-Java library? It looks to me that for bigger size GRIBS this method is going to take ages. Besides, I should remove all NaN values from the data array for each record

2.- Should I go for more GRIB2 specific libraries like pygrib or pynio? If so…any working examples on parsing GRIB2 files with those?

3.- I also tried using wgrib2 command line tool…like…generating a CSV file with that tool and then parse the CSV file. Thing is that the generated CSV file is not very comprehensive either

Thanks!!!
Alejandro

EDIT: You can find an answer to this (using Python, tough) here. I must also say that I was able to do this too using Netcdf4Java, although with GRIB-API from ECMWF is way simpler.

Best Answer

I've worked with GRIB files a fair bit, and can I just say that large GRIB files are inefficient to parse for single points.

The reason why they're inefficient is that they are naive binary files, in that there is no global header and index (unlike NetCDF) and so to find out what layers or timesteps are in a file, software needs to go through the whole file to find them all.

The command line program cdo is frequently used in the science community, and it has a way to do inspection, like cdo sinfov <filename>. You can also get other summary information quickly.

For actual file handling and data extraction programmatically, I find the NetCDF4 library for Python to be much better. It's very easy, well documented, and fast (when reading NetCDF files).