Solved – Building a time series that includes multiple observations for each date

rtime series

I'm trying to apply a time series to quarterly sampled data (animal biomass) over a 10 year period with 3 reps per quarter. So 40 dates but 120 total observations.

I have read up to SARIMA'a in Shumway and Stoffer's Time Series Analysis and it's Applications as well as skimmed Woodward, et. al.'s Applied Time Series Analysis, and my understanding is each model is based on a single observation at each point in the time series.

QUESTION: How can I include the variation in each observation in my model? I could build a series on the mean, but I would loose the variation at each observation and I think that is critical to my understanding of what is happening.

Best Answer

Depending on what exactly you mean by "3 reps per quarter" a panel data (wikipedia) model may make sense. This would mean that you're taking three measurements ever quarter, one from each of three distinct sources that stay the same over time. Your data would look something like:

obs quarter value
  A       1   2.2 
  A       2   2.3 
  A       3   2.4 
  B       1   1.8 
  B       2   1.7 
  B       3   1.6 
  C       1   3.3 
  C       2   3.4 
  C       3   3.5 

If this is what you're looking at, there are a number of models for working with panel data. Here's a decent presentation that covers some of the basic R that you would use to look at panel data. This document goes into a little more depth, albeit from an econometrics standpoint.

However, If your data doesn't quite fit with panel data methodologies, there are other tools available for "pooled data". A definition from this paper (pdf):

Pooling of data means statistical analysis using multiple data sources relating to multiple populations. It encompasses averaging, comparisons and common interpretations of the information. Different scenarios and issues also arise depending on whether the data sources and populations involved are same/similar or different.

As you can see, from that definition, the techniques you're going to use are going to be dependent on what exactly you expect to learn from your data.

If I were to suggest a place for you to start, assuming that your three draws for each quarter are consistent over time, I would say start by using a fixed effects estimator (also known as the within estimator) with a panel data model of your data.

For my example above, the code would look something like:

> Panel = data.frame(value=c(2.2,2.3,2.4,1.8,1.7,1.9,3.3,3.4,3.5), 
                     quarter=c(1,2,3,1,2,3,1,2,3), 
                     obs=c("A","A","A","B","B","B","C","C","C"))
> fixed.dum <-lm(value ~ quarter + factor(obs), data=Panel)
> summary(fixed.dum)

Which gives us the following output:

Call:
lm(formula = value ~ quarter + factor(obs), data = Panel)

Residuals:
         1          2          3          4          5          6          7 
-1.667e-02 -8.940e-17  1.667e-02  8.333e-02 -1.000e-01  1.667e-02 -1.667e-02 
         8          9 
 1.162e-16  1.667e-02 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.13333    0.06055  35.231 3.47e-07 ***
quarter       0.08333    0.02472   3.371 0.019868 *  
factor(obs)B -0.50000    0.04944 -10.113 0.000162 ***
factor(obs)C  1.10000    0.04944  22.249 3.41e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.06055 on 5 degrees of freedom
Multiple R-squared: 0.9955, Adjusted R-squared: 0.9928 
F-statistic: 369.2 on 3 and 5 DF,  p-value: 2.753e-06 

Here we can clearly see the effect of time in the coefficient on the quarter variable, as well as the effect of being in group B, or group C (as opposed to group A).

Hope this points you somewhere in the right direction.

Related Question