Solved – Time variable in Longitudinal data set mixed model question

mixed modelr

It has been a few years since I fit a mixed model, so I have gone on a massive review session on old notes, books (Pinheiro and Bates, Faraway, etc) and going through the posts on SO and CV about mixed models. It has been great, but I am left with a few questions that are likely so basic they are missed in most of these materials.

The data set I am working with has a time column that starts at 1 and goes to 38 by period. I also have 3 variables, one for each, year, quarter and period. In my fixed effect, should I be modeling sales~Time+quarter+year+period or should I ignore the quarter, year and period variables and just use my aggregated time variable. Should I put ordered(time) inthere? On the one hand, I imagine having the additional quarter/period/year variables in there might be redundant, since they are already in the Time variable, but at the same time, it might be good to try and use them to control for seasonality?

I am wondering if my fixed effect should be

Sales~Time+Quarter+year+period+Region+Policy

or

Sales~Time+Region+Policy

Also, if anyone has advice on fitting models with more than 1 or two random effects, it would be much appreciated. In all of my notes, its mostly just specific designs with 1 random effect (sometimes intercept and sometimes slope + intercept), but rarely multiple. I currently am trying to look at 5 variables I would like to be random effects (at least look at them). Is it as simple as (1|A) + (1|B) + (1|C) + etc in lmer? Any trouble I am walking myself into?

Thank you as always CV.

Edit: I am interested in a change in policy that has potentially harmed sales. I have a 3 year data set and in the middle of that data set we changed our policy and I am fitting LMM to see first if there is actually a negative effect and second to get a vague idea of how big it is, if its a decrease of .001%, we don't care, but if its a larger number it should be examined further (which naturally will be my next task).

Edit2: For random effects (which I am still in the process of tinkering with) I have the product, who is paying and purchase quantity. What makes the cut here is still a work in progress, currently using LRT and AIC (mostly AIC, because there seems to be reservations about LRT with lmer sometimes). There are 38 months in the set and each month has sales values for each type of payment recieved (there are 5), each product sold (3), the region it is sold in (~30 here). The other variables I have are sales, number of things sold, the team that did the selling (in just because the change in policy relates to it) and a dummy variable for change in policy. The model I am working around right now is looking something like:

m1 = lmer(Sales~ Time+Policy+Team+(Product|Territory)+(salesqty|Territory)+ (payer|Territory), data=data ) 

Though like I said, this is by no means final, just what I have running in R at this second.

Best Answer

You have quite a few potential variables to include in your model, and I think there is a real possibility of underpowering the analysis. In English: I think you want to fit the simplest model you can. If you wish to include Quarter, year, or period, you'll either need to have these specified as factors (you may have done this already) or alternatively fit a set of dummy variables - using as.factor is much easier. :)

I would try this simple model first:

lmer(Sales ~ Policy + Quarter + (1|Time), data=data)

I think that the Quarter factor is best for trying to capture trend - it's a smaller subset of year, and I wouldn't include other factors like Region or Team yet as that will complicate the model. You're looking for a main effect for Policy. I have included Time as a random effect as I think that is the best way of capturing the idea that the Sales vary randomly over time, and we wish to generalise the policy effect over time, so Time should not be fitted as a fixed effect.

If you wish, you could start adding in more of your extra variables one-by-one and then compare the model outputs using aov assuming you're storing the lmer results. But I wouldn't start with a complicated model first.

Update: the reason I suggest Time as a random effect is that you have time points before and after the policy implementation that aren't in your data. Also, with the simple model I have suggested, there are repeated measures at each time point, from region, sales team, etc, that aren't in the model, so I think that using Time in this model as a random effect is the best way of representing all that underlying complexity.