MATLAB: Daily average precip for many years

mean daily precip values

I have a large matrix of two columns containing dates (eg 1/1/1991) and precip values. I want to create the average daily precip for all years from 1991-2016 such that the day of the year is in column 1 and the ave precip value is in column 2. I'm getting bogged down I believe because of leap years (not sure how to eliminate that date, which is fine). Any help would be greatly appreciated.

Best Answer

Just taking a stab presuming the file is something recognizable--

precip=table2timetable(readtable('yourfile.ext'));  % read the file; convert to timetable
daily=retime(precip,'daily','mean');

Done!

"Here is the precip file..."

Oh...only daily data and already in table--OK, then create the grouping variable

>> precip.DOM=day(precip.DATE);           % day of month each entry
>>daily=varfun(@mean,precip,'groupingvariables','DOM','inputvariables','Precipitation');
...

ADDENDUM OK, answer the question asked... :)

>> precip.DOY=day(precip.DATE,'dayofyear');
>> daily=varfun(@mean,precip,'groupingvariables','DOY','inputvariables','Precipitation');
>> whos daily
Name         Size            Bytes  Class    Attributes
daily      366x3             10149  table              
>> daily(1:10,:)
ans =
10×3 table
  DOY    GroupCount    mean_Precipitation
  ___    __________    __________________
   1     26            3.27692307692308  
   2     26                         3.8  
   3     26                         2.5  
   4     26            2.84230769230769  
   5     26            3.94615384615385  
   6     26            5.21923076923077  
   7     26            2.13461538461539  
   8     26            2.53461538461538  
   9     26                        3.65  
  10     26            3.33461538461538  
>>

should work; I believe the 'dayofyear' option came about R2104 or so...there's just too much stuff to try to recall introduced too rapidly.

Related Solutions

MATLAB: Removing a Date (Leap Year) from being Averaged

Not 100% sure what you are trying to end up with, but my guess is the average number of each type of complaint for each day of the year: 365 of them. You are on the right track using 'dayofyear', but there are easier ways to get to that point. First ,read the data into a timetable.

>> tt1 = readtimetable('SSArcGIS.xls');
>> tt1.ComplaintLocation = categorical(tt1.ComplaintLocation);
>> head(tt1)
ans =
  8×1 timetable
       Date        ComplaintLocation
    ___________    _________________
    02-Jan-2010           S13       
    02-Jan-2010           S11       
    02-Jan-2010           S5        
    04-Jan-2010           S5        
    05-Jan-2010           S13       
    06-Jan-2010           S12       
    06-Jan-2010           S5        
    06-Jan-2010           S3

You have one row per complaint, across all types in one variable. You want one variable for each type. unstack is good at doing that. In this case, you want counts, so you'll need to add a dummy variable to "count up".

>> tt1.Dummy = ones(height(tt1),1);
>> tt2 = unstack(tt1,'Dummy','ComplaintLocation');
>> head(tt2)
ans =
  8×14 timetable
       Date        S1     S10    S11    S12    S13    S14    S2     S3     S4     S5     S6     S7     S8     S9 
    ___________    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___
    02-Jan-2010    NaN    NaN      1    NaN      1    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN
    04-Jan-2010    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN
    05-Jan-2010    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN
    06-Jan-2010    NaN    NaN    NaN      1    NaN    NaN    NaN      1    NaN      1      1    NaN    NaN    NaN
    07-Jan-2010      1    NaN      1      1    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN
    08-Jan-2010    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN      1    NaN
    09-Jan-2010    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN
    13-Jan-2010    NaN    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN    NaN    NaN      1    NaN

Unstack uses @sum by default to "aggregate", in other words, each date might have three S1 complaints, so unstack would compute sum([1;1;1]), and fill the corresponding element of the output with 3. For empty cases, sum returns NaN. It would be easy to use the AggregationFunction parameter to fill those with 0, but it's even easier to replace the NaNs afterwards.

>> tt2 = fillmissing(tt2,'Constant',0);
>> head(tt2)
ans =
  8×14 timetable
       Date        S1    S10    S11    S12    S13    S14    S2    S3    S4    S5    S6    S7    S8    S9
    ___________    __    ___    ___    ___    ___    ___    __    __    __    __    __    __    __    __
    02-Jan-2010    0      0      1      0      1      0     0     0     0     1     0     0     0     0 
    04-Jan-2010    0      0      0      0      0      0     0     0     0     1     0     0     0     0 
    05-Jan-2010    0      0      0      0      1      0     0     0     0     0     0     0     0     0 
    06-Jan-2010    0      0      0      1      0      0     0     1     0     1     1     0     0     0 
    07-Jan-2010    1      0      1      1      0      0     0     0     0     0     0     0     0     0 
    08-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     1     0 
    09-Jan-2010    0      0      0      1      0      0     0     0     0     0     0     0     0     0 
    13-Jan-2010    0      0      0      0      0      1     0     0     0     0     0     0     1     0

The data skips quite a few days:

>> caldiff(tt1.Date([1 end]),'days')
ans = 
  calendarDuration
   3651d
>> height(tt2)
ans =
        3122

One wonders if maybe the complaint center is closed on weekends?

>> unique(day(tt2.Date,'dow'))
ans =
     1
     2
     3
     4
     5
     6
     7

Nope, apparently not, so all those missing days must have had no complaints. To get the true mean per day, you'll need to add them in.

>> t = datetime(2010,1,1):caldays(1):datetime(2019,12,31);
>> tt3 = retime(tt2,t,'FillWithConstant','Constant',0);
>> head(tt3)
ans =
  8×14 timetable
       Date        S1    S10    S11    S12    S13    S14    S2    S3    S4    S5    S6    S7    S8    S9
    ___________    __    ___    ___    ___    ___    ___    __    __    __    __    __    __    __    __
    01-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     0     0 
    02-Jan-2010    0      0      1      0      1      0     0     0     0     1     0     0     0     0 
    03-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     0     0 
    04-Jan-2010    0      0      0      0      0      0     0     0     0     1     0     0     0     0 
    05-Jan-2010    0      0      0      0      1      0     0     0     0     0     0     0     0     0 
    06-Jan-2010    0      0      0      1      0      0     0     1     0     1     1     0     0     0 
    07-Jan-2010    1      0      1      1      0      0     0     0     0     0     0     0     0     0 
    08-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     1     0 
>> height(tt3)
ans =
        3652

Now we're in business: add day of year as a grouping variable, and compute grouped means. Once you have day of year, the actual dates are irrelevant, so convert to a table.

>> tt3.DoY = day(tt3.Date,'dayofyear');
>> tt4 = timetable2table(tt3,'ConvertRowTimes',false);
>> tt4 = varfun(@mean,tt4,'GroupingVariable','DoY');
>> head(tt4)
ans =
  8×16 table
    DoY    GroupCount    mean_S1    mean_S10    mean_S11    mean_S12    mean_S13    mean_S14    mean_S2    mean_S3    mean_S4    mean_S5    mean_S6    mean_S7    mean_S8    mean_S9
    ___    __________    _______    ________    ________    ________    ________    ________    _______    _______    _______    _______    _______    _______    _______    _______
     1         10          0.1        0.1           0         0.4         0.1           0         0.1        0.1        0.1        0.1          0        0.1        0.1          0  
     2         10            0        0.2         0.4         0.5         0.6         0.2           0        0.4        0.2        0.5        0.1        0.1        0.1          0  
     3         10          0.6        0.7         0.1         0.7         0.4         0.2         0.3        0.4        0.2          1        0.1          0          0          0  
     4         10          0.1        0.2         0.2         0.6         0.4         0.2         0.2        0.3        0.2        0.9          0        0.4        0.1        0.1  
     5         10          0.1        0.2         0.1         0.3         0.3         0.3           0        0.3        0.2        0.3        0.4        0.1        0.4        0.1  
     6         10            0        0.1         0.4         0.9         0.5         0.2         0.2        0.6        0.1        0.6        0.6        0.1        0.1        0.3  
     7         10          0.1        0.1         0.4         0.8         0.3         0.3         0.1          0          0          0          0          0        0.2          0  
     8         10          0.1        0.2         0.2         0.6         0.3         0.1         0.3        0.3        0.1        0.6        0.2        0.1        0.4          0

You could also do that with groupsummary. Leap days? Just remove them.

>> tt3(tt3.DoY==366,:)
ans =
  2×15 timetable
       Date        S1    S10    S11    S12    S13    S14    S2    S3    S4    S5    S6    S7    S8    S9    DoY
    ___________    __    ___    ___    ___    ___    ___    __    __    __    __    __    __    __    __    ___
    31-Dec-2012    0      0      0      0      0      0     0     0     0     4     0     0     1     0     366
    31-Dec-2016    0      0      0      0      0      0     0     0     0     0     0     0     0     0     366
>> tt5 = tt4(tt4.DoY~=366,:);

There were a bunch of steps there. Several of them could be combined, but simplicity.

MATLAB: Is it possible to write a loop for doing this

Behzad - look for what is common. Your code has twelve months, so you should be able to replace that with a loop. Perhaps something like

nCELL = numel(MERRA2_selected);
newCELL = cell(2,nCELL );
for i=1:nCELL
    date_month = month(MERRA2_selected{i}.date);
    for k = 1:12
        month_mask = date_month == k
        month_precip = MERRA2_selected{i}.precip(month_mask);
        newCELL{k,i} = month_precip;
    end
end

Best Answer

Related Solutions

MATLAB: Removing a Date (Leap Year) from being Averaged

MATLAB: Is it possible to write a loop for doing this

Related Question