I have extracted ERA5 temperature data for specific points that contain the day of the year when the temperature exceeds 10 degrees celsius. And now I want to see how the temperature develops hereafter.
Data:
head(df)
field_1 doy ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
69785317 93 1 0.1712229 1.335247 2.225898 0.588482 5.12949 0 -0.6848916 -2.910789 6.29925 0 5.5136687 3 3.1712229 6.19856 4 7.56689 5.34123
69785318 93 2 0.1712229 1.335247 2.225898 0.588482 5.12949 0 -0.6848916 -2.910789 6.29925 0 5.5136687 3 3.1712229 6.19856 4 7.56689 5.34123
The column names are field ID
, day of the year doy
, ID
and the rest are actually the day of the year doy
too, but for ERA5 data (up until doy 365).
I would like to set every ERA5 column that is before the doy
for every single row to NA
, and keep the columns with temperatures measured after the (row) day of the year.
So something like:
field_1 doy ID 1 2 3 4 5 6 ... 94 95 96
69785317 93 1 NA NA NA NA NA NA ... 10.6848916 12.910789 16.29925
69785318 93 2 NA NA NA NA NA NA ... 10.6848916 12.910789 16.29925
I tried to subset the data as followed:
df = subset(df, colnames(df)[6:9] > df$doy)
I also tried:
df[outer(rownames(df$doy), colnames(df), "<")] <- NA
Both did not work, the latter since it is not a matrix I think. How can I change this to end up like my example given as the second code?
Edit:
I tried to perform Elio's answer, but with the bind_rows
command I do not get the doy
columns prior to 93. field_1
ids of 69792725 and 69787246 are for example from day 80 of the year.
field_1 93 94 95 96 97 98 99 100
1 69785317 1.883452 0 0 0.6848916 21.06042 0 0 0
2 69785318 1.883452 0 0 0.6848916 21.06042 0 0 0
3 69787245 1.883452 0 0 0.6848916 21.06042 0 0 0
4 69787246 1.883452 0 0 0.6848916 21.06042 0 0 0
5 69790938 NA NA NA NA NA 0 0 0
6 69790939 NA NA NA NA NA NA 0 0
7 69792725 1.883452 0 0 0.6848916 21.06042 0 0 0
8 69792726 NA NA NA NA NA 0 0 0
9 69792727 NA NA NA NA NA NA 0 0
10 69792728 NA NA NA NA NA NA 0 0
11 69792729 NA NA NA NA NA NA NA NA
12 69794537 NA NA NA NA 21.06042 0 0 0
13 69794538 NA NA NA NA NA 0 0 0
14 69794539 NA NA NA NA NA NA NA NA
15 69794540 NA NA NA NA NA NA NA NA
16 69794541 NA NA NA NA NA NA NA NA
17 69796333 NA NA NA NA 21.06042 0 0 0
18 69796334 NA NA NA NA NA NA 0 0
19 69796335 NA NA NA NA NA NA 0 0
20 69796336 NA NA NA NA NA NA NA 0
Apparently, you would have to sort the doy
in an ascending order, so the lowest value is first.
Best Answer
The following code uses
lapply
to store slices of yourdata.frame
, which then are bound together withbind_rows
, which allows for different lengths; code should be easy adapt to your object. Although this question should be better posted at stackoverflow, since the geographic part of the proceeding happened before: