I have a large matrix of two columns containing dates (eg 1/1/1991) and precip values. I want to create the average daily precip for all years from 1991-2016 such that the day of the year is in column 1 and the ave precip value is in column 2. I'm getting bogged down I believe because of leap years (not sure how to eliminate that date, which is fine). Any help would be greatly appreciated.
MATLAB: Daily average precip for many years
mean daily precip values
Related Solutions
Not 100% sure what you are trying to end up with, but my guess is the average number of each type of complaint for each day of the year: 365 of them. You are on the right track using 'dayofyear', but there are easier ways to get to that point. First ,read the data into a timetable.
>> tt1 = readtimetable('SSArcGIS.xls');>> tt1.ComplaintLocation = categorical(tt1.ComplaintLocation);>> head(tt1)ans = 8×1 timetable Date ComplaintLocation ___________ _________________ 02-Jan-2010 S13 02-Jan-2010 S11 02-Jan-2010 S5 04-Jan-2010 S5 05-Jan-2010 S13 06-Jan-2010 S12 06-Jan-2010 S5 06-Jan-2010 S3
You have one row per complaint, across all types in one variable. You want one variable for each type. unstack is good at doing that. In this case, you want counts, so you'll need to add a dummy variable to "count up".
>> tt1.Dummy = ones(height(tt1),1);>> tt2 = unstack(tt1,'Dummy','ComplaintLocation');>> head(tt2)ans = 8×14 timetable Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9 ___________ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 02-Jan-2010 NaN NaN 1 NaN 1 NaN NaN NaN NaN 1 NaN NaN NaN NaN 04-Jan-2010 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN 05-Jan-2010 NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN 06-Jan-2010 NaN NaN NaN 1 NaN NaN NaN 1 NaN 1 1 NaN NaN NaN 07-Jan-2010 1 NaN 1 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 08-Jan-2010 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN 09-Jan-2010 NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13-Jan-2010 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN 1 NaN
Unstack uses @sum by default to "aggregate", in other words, each date might have three S1 complaints, so unstack would compute sum([1;1;1]), and fill the corresponding element of the output with 3. For empty cases, sum returns NaN. It would be easy to use the AggregationFunction parameter to fill those with 0, but it's even easier to replace the NaNs afterwards.
>> tt2 = fillmissing(tt2,'Constant',0);>> head(tt2)ans = 8×14 timetable Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9 ___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __ 02-Jan-2010 0 0 1 0 1 0 0 0 0 1 0 0 0 0 04-Jan-2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 05-Jan-2010 0 0 0 0 1 0 0 0 0 0 0 0 0 0 06-Jan-2010 0 0 0 1 0 0 0 1 0 1 1 0 0 0 07-Jan-2010 1 0 1 1 0 0 0 0 0 0 0 0 0 0 08-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 1 0 09-Jan-2010 0 0 0 1 0 0 0 0 0 0 0 0 0 0 13-Jan-2010 0 0 0 0 0 1 0 0 0 0 0 0 1 0
The data skips quite a few days:
>> caldiff(tt1.Date([1 end]),'days')ans = calendarDuration 3651d>> height(tt2)ans = 3122
One wonders if maybe the complaint center is closed on weekends?
>> unique(day(tt2.Date,'dow'))ans = 1 2 3 4 5 6 7
Nope, apparently not, so all those missing days must have had no complaints. To get the true mean per day, you'll need to add them in.
>> t = datetime(2010,1,1):caldays(1):datetime(2019,12,31);>> tt3 = retime(tt2,t,'FillWithConstant','Constant',0);>> head(tt3)ans = 8×14 timetable Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9 ___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __ 01-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02-Jan-2010 0 0 1 0 1 0 0 0 0 1 0 0 0 0 03-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 04-Jan-2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 05-Jan-2010 0 0 0 0 1 0 0 0 0 0 0 0 0 0 06-Jan-2010 0 0 0 1 0 0 0 1 0 1 1 0 0 0 07-Jan-2010 1 0 1 1 0 0 0 0 0 0 0 0 0 0 08-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 1 0 >> height(tt3)ans = 3652
Now we're in business: add day of year as a grouping variable, and compute grouped means. Once you have day of year, the actual dates are irrelevant, so convert to a table.
>> tt3.DoY = day(tt3.Date,'dayofyear');>> tt4 = timetable2table(tt3,'ConvertRowTimes',false);>> tt4 = varfun(@mean,tt4,'GroupingVariable','DoY');>> head(tt4)ans = 8×16 table DoY GroupCount mean_S1 mean_S10 mean_S11 mean_S12 mean_S13 mean_S14 mean_S2 mean_S3 mean_S4 mean_S5 mean_S6 mean_S7 mean_S8 mean_S9 ___ __________ _______ ________ ________ ________ ________ ________ _______ _______ _______ _______ _______ _______ _______ _______ 1 10 0.1 0.1 0 0.4 0.1 0 0.1 0.1 0.1 0.1 0 0.1 0.1 0 2 10 0 0.2 0.4 0.5 0.6 0.2 0 0.4 0.2 0.5 0.1 0.1 0.1 0 3 10 0.6 0.7 0.1 0.7 0.4 0.2 0.3 0.4 0.2 1 0.1 0 0 0 4 10 0.1 0.2 0.2 0.6 0.4 0.2 0.2 0.3 0.2 0.9 0 0.4 0.1 0.1 5 10 0.1 0.2 0.1 0.3 0.3 0.3 0 0.3 0.2 0.3 0.4 0.1 0.4 0.1 6 10 0 0.1 0.4 0.9 0.5 0.2 0.2 0.6 0.1 0.6 0.6 0.1 0.1 0.3 7 10 0.1 0.1 0.4 0.8 0.3 0.3 0.1 0 0 0 0 0 0.2 0 8 10 0.1 0.2 0.2 0.6 0.3 0.1 0.3 0.3 0.1 0.6 0.2 0.1 0.4 0
You could also do that with groupsummary. Leap days? Just remove them.
>> tt3(tt3.DoY==366,:)ans = 2×15 timetable Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9 DoY ___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __ ___ 31-Dec-2012 0 0 0 0 0 0 0 0 0 4 0 0 1 0 366 31-Dec-2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 366>> tt5 = tt4(tt4.DoY~=366,:);
There were a bunch of steps there. Several of them could be combined, but simplicity.
Behzad - look for what is common. Your code has twelve months, so you should be able to replace that with a loop. Perhaps something like
nCELL = numel(MERRA2_selected);newCELL = cell(2,nCELL );for i=1:nCELL date_month = month(MERRA2_selected{i}.date); for k = 1:12 month_mask = date_month == k month_precip = MERRA2_selected{i}.precip(month_mask); newCELL{k,i} = month_precip; endend
Best Answer