Solved – lmer predict with random effects & log transformation

lme4-nlmemixed modelr

I am currently trying to build a mixed effect model using the lme4 package with the in-built lmer() function.

My goal is to predict the number of units sold in a specific site on a specific date based on independent variables such as the retail price or promotions.

Note: I transform the number of units sold in each day into log(units).

At the moment I am reviewing the forecasts of my model looking at the plots. Looking at the generated plots I observed 2 key things:

(1) I had the impression that the sigma of the model is not the correct one.

(2) I had the impression that the model only takes fixed effects into consideration and neglects the random effects which I specify.

Here are the key lines in the code which I assume to be relevant for my questions:

d_data$units_as_log <- log(d_data$units+0.001) # Log Transformation of Units
#Specifying the model formula (very basic model)
model_vec <- c("units_as_log ~ retail_price + (1|article) + (1|site)")   
model <- lmer(model_vec[ii], 
    data=d_data[d_data$date <= endofhistory & d_data$AVAIL == 1,], REML=FALSE)
    # <= endofhistory describes the model interval
# > endofisthistory describes the holdout period, with re.form=NULL i try to include random effects
fcst <- exp(predict(model, newdata=subdata[subdata$date > endofhistory,], 
  re.form = NULL)+summary(model)$sigma^2/2) 
runMSE[ii] <- mean((fcst-actuals)^2, na.rm=TRUE)

Question #1: Is the sigma of summary(model) the correct sigma or do I need to calculate it myself somehow?

I previously read that the sigma function of lme4 was moved to the stats package.

Question #2: Is the transformation from log_units to units with + summary(model)$sigma^2/2 correct?

I discovered several posts on StackExchange which gave different answers, hence my confusion here.

Question #3: Do I already include my specified random effects? Do I need to calculate the results myself by building custom matrices?

I previously discovered posts saying lme4 has no predict() function (state of 2014). Nevertheless, I identified that there is a predict() function in the stats package which can deal with lmer objects, including random effects with re.form = NULL.

Best Answer

It would help a lot to have a reproducible example, but I'll try.

Question #1: Is the sigma of summary(model) the correct sigma or do I need to calculate it myself somehow?

It's correct: sigma(model) and summary(model)$sigma are identical, the former is a little more compact but not different. (Why do you think it's incorrect?)

I previously read that the sigma function of lme4 was moved to the stats package.

This just means that sigma() was established as a generic function in base R, so the generic function is no longer defined in the lme4 package. But sigma(fitted_model) still works the same.

Question #2: Is the transformation from log_units to units with + summary(model)$sigma^2/2 correct? (I discovered several posts on StackExchange which gave different answers, hence my confusion here.)

What are you trying to do? To reverse your transformation y=log(x+0.001) you need x=exp(y)-0.001. The reason that people sometimes add in sigma^2/2 is because analyzing on the log-transformed scale will give a biased estimate of the (arithmetic) mean on the original scale: see e.g. here. My opinion is that this isn't necessarily worth doing unless you're specifically interested in predicting the mean (rather than, say, the median) on the original scale, but it's worth keeping in mind.

Question #3: Do I already include my specified random effects? Do I need to calculate the results myself by building custom matrices?

You shouldn't need to. When in doubt, the documentation of the current version of the package you have installed (?predict.merMod) gives the authoritative answer. Do you have a reason to believe that re.form=NULL is not doing what you want? (Hint: if you have an overly complex model, or too few levels in your grouping variables, you may have estimated random effects variances of zero, in which case it will look like your random effects aren't doing anything ... check VarCorr(fitted_model), and note that variances are sometimes estimated as being very small (e.g. 1e-06) rather than exactly zero ...)

Note that this question might be more appropriate for r-sig-mixed-models@r-project.org than CrossValidated (go to the info page to subscribe)