Solved – Scaling predictors in mixed models

lme4-nlmemixed modelrrepeated measuresstandardization

I collected data over multiple locations and years on crop yield and want to regress yield as a function of rainfall and heat-stress which are in two different units. Suppose my dataframe has 5 columns: year, location, yield, rainfall and temperature. These are my steps:

 dat[,4:5] <- scale(dat[, 4:5], center = T, scale  T)
 model <- lmer(yield ~ rain + temp + (1|location) + (1|year), data = dat)

After I get this model and I want to use the model for prediction. Suppose I collect new data from different years or locations called dat1 which has 4 columns: year, location, rainfall and temperature.

My confusion is since the fitted model takes in standardised rainfall and temperature, how do I standardise these two variables in the new data dat1? Do I simple do:

dat1[,3:4] <- scale(dat1[,3:4], center = T, scale = T)
predict(model, newdata = dat1)

Or do I have to standarise the new data using mean and standard deviation of the original data dat?

Best Answer

Just because there is a warning message about predictors being on a different scale does not mean that you need to standardize them. Standardization can lead to difficulties with interpretation.

Generally, you only need to rescale the offending variable(s) by multiplying or dividing it by something appropriate, and since it is only a warning, you don't really need to do anything, although there are some situations where variables on vastly differing scales can cause numerical instability during model fitting.

You are correct that if you standardize the variables that are used as inputs for your model, you will have to standardize those in your test/prediction dataset too.

Related Question