R – How to Read NetCDF4 Format Climate Data Observation in R

climatenetcdfqgisr

I downloaded yearly Temp/Precip observation by the globe from here: Free open source climate data archive, and I download those for my research case: Terrestrial Air Temperature: 1900-2014 Gridded Monthly Time Series (V 4.01) and Terrestrial Precipitation: 1900-2014 Gridded Monthly Time Series (V 4.01). I need to clip out all data observation record of all German weather stations. However, I intend to extract average annual temperature and annual precipitation for each of these station coordinates of Germany.

But, when I tried to load those data and read them in R, I ran into problems. Respective data archive's website can be found look here. I think data format could be NetCDF4 format based on the description from above web link 4. I installed raster and ncdf4 package to read the data in R, but such operations were failed.

Here is how data look like:

list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)

 [1] "air_temp.1980" "air_temp.1981" "air_temp.1982" "air_temp.1983"
 [5] "air_temp.1984" "air_temp.1985" "air_temp.1986" "air_temp.1987"
 [9] "air_temp.1988" "air_temp.1989" "air_temp.1990" "air_temp.1991"
[13] "air_temp.1992" "air_temp.1993" "air_temp.1994" "air_temp.1995"
[17] "air_temp.1996" "air_temp.1997" "air_temp.1998" "air_temp.1999"
[21] "air_temp.2000" "air_temp.2001" "air_temp.2002" "air_temp.2003"
[25] "air_temp.2004" "air_temp.2005" "air_temp.2006" "air_temp.2007"
[29] "air_temp.2008" "air_temp.2009" "air_temp.2010" "air_temp.2011"
[33] "air_temp.2012" "air_temp.2013" "air_temp.2014"

Here is what I did in R:

library(ncdf4)
library(raster)
list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)

nc_open("stella/data/air_temp_1980_2014/air_temp.1980")
raster("stella/data/air_temp_1980_2014/air_temp.1980")

Here is error that raised by Rstudio:

> nc_open("stella/data/air_temp_1980_2014/air_temp.1980")
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open("stella/data/air_temp_1980_2014/air_temp.1980") : 
  Error in nc_open trying to open file stella/data/air_temp_1980_2014/air_temp.1980


> raster("stella/data/air_temp_1980_2014/air_temp.1980")
Error in .local(.Object, ...) : Couldn't determine X spacing

Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer",  : 
  Cannot create a RasterLayer object from this file.

FYI, here I put the data on the fly for the test on your machine, files can be found on the fly, Here and Here.

How can I read those data in R?

Plus, I only need to pull out all station coordinates of germany from yearly Temp observation for the globe?

How can I make this happen in R?

Any workaround to do this?

Any more thoughts?


How can I read those data correctly in R (as data.frame object in R) ?

How to deal with this data?

Any thoughts?

Best Answer

These data are not in a netcdf format but, rather a space delimited ASCII format. This data is a bit difficult to deal with because of the lack of headers and any type of unique station identifier.

Returns a vector of file names on disk and pulls associate years from file name.

f <- list.files(getwd())
y <- as.numeric(unlist(lapply(strsplit(f, "[.]"), function(x) {x[2]})))

Reads files, adds associated year and combines into single data.frame.

stations <- data.frame()
  for(i in 1:length(f)) {
    d <- read.table(f[1], sep = "" , header = FALSE, na.strings = "")
      d[,"year"] <- y[i]
        names(d) <- c("long","lat", format(ISOdate(2000, 1:12, 1), "%b"), "year")
    stations <- rbind(stations, d)  
}
stations$ID <- paste(stations[,1], stations[,2], sep="_")  
  str(stations)     

Since there is no unique identifier for a given station, it is difficult to aggregate the duplicate coordinates to each given unique station location. I had to get creative and use a unique concatenated vector of the coordinate pairs. Once you have the data subset to the desired region, you can use merge() or which(x %in% y) to create a subset index, with the ID column, containing the unique coordinate pairs to relate back to the data. However, since the data is by year, in a wide format, you cannot store it as a spatial class object without duplicating the feature geometry. Please give some though on how you are going to analyze this data before just jumping in.

library(sp) 
xy <- unique( paste(stations[,1], stations[,2], sep="_")  )
stations.xy <- data.frame(long = as.numeric(unlist(lapply(strsplit(xy, "_"), function(x) {x[1]}))), 
                          lat =  as.numeric(unlist(lapply(strsplit(xy, "_"), function(x) {x[2]}))))
    stations.xy$ID <- xy                      
    coordinates(stations.xy) <- ~long+lat

str(stations.xy@data)
  plot(stations.xy)

I am skipping any overlay/subset analysis because it is covered in your previous question. However, this can easily be accomplished using raster::intersect, spatialEco:point.in.poly, sp::over, rgeos::gIntersects, etc...

Related Question