Solved – Time series with multiple subjects and multiple variables

I am a web developer and novice statistician.

My data looks something like this

Subject  Week   x1  x2  x3  x4  x5  y1
A        1      .5  .6  .7  .8  .7  10
B        1      .3  .6  .2  .1  .3  8
C        1      .3  .1  .2  .3  .2  6  
A        2      .1  .9  1.5 .8  .7  5
B        2      .3  .6  .3  .1  .3  2
D        2      .3  .1  .4  .3  .5  10

I am trying to predict y1 as a product of the x variables. However, I have reason to believe that there may be a lag in the effect of the multiple x variables on y1, i.e the x variables from week 1 for subject A influence y1 for subject A in week 2.

Note that not all subjects will have data points for every week (in fact most won't). Subjects will tend to have data points for say week 1, 2, 3, 4 then drop off and not show up again until week 7,8,9. I am willing to restrict my analysis to data points where we have data for the previous N weeks given my hypothesis about lag.

Like I said, I am a novice and am unsure of the best way to deal with a dataset of this form. I am hoping to carry out this analysis either in R, Python, or some combination of the two.
I don't think that the current week's x variables will have no effect. I think they will have some effect, perhaps greater than previous weeks. I just believe that previous weeks will have some effect.

I am expecting there to be two to three weeks of lag. To give a little context, the analysis that I am attempting here relates to judging the quality of online traffic. Every week I get a score rating the quality of a certain stream of users I send to a given website. I am trying to find secondary metrics, such as browser distribution, percent duplicate clickouts, etc. that will allow me to predict what that score will be ahead of time.

Best Answer

As I mentioned in my note above, I would treat this as a regression problem. Here is a link to constructing, in R, the lag (and lead) variables from your data (R Head).

Included in the post is a brief introduction to using the resulting data in a regression model. You might also want to do a bit of background digging on the R package dynlm (dynamic linear regression).

Best Answer

Related Solutions

Solved – Between / Within-Subjects Analysis (with multiple dependent variables): Multi level expert around

Solved – Multivariate Time Series Forecasting in R – data in 10 minute intervals

Related Question