Solved – Scaling after Principal Component Analysis

I am attempting to model the yield of various crops as a function of weather data, namely one temperature variable and 7 moisture-related variables (measuring different aspects of moisture content). The moisture readings exhibited a significant degree of collinearity and were all using different units, and so as recommended by some other answers on this site, I scaled the moisture variables and applied Principal Component Analysis, picking the PCs that accounted for > 95% of the variance cumulatively.

However, I now have a question regarding when to scale the data prior to applying machine learning techniques. I'm trying to build a mixed effects model with lmer in lme4 package. Since the PCs were obtained by scaling only the moisture data, if I wanted to make a model of the form
yield ~ temperature + PC1 +... + PCN + (1|categorical vars), would I need to re-scale the dataset consisting of temperature, PC1,…,PCN?

Also, is it recommended to scale the response variable as well? Any clarification and help would be much appreciated; I'm only just getting started on this path.

Best Answer

Re-scaling is not necessary and won't affect your model's predictions, unless the data are on such wildly different scales that the model struggles to converge (in which case lmer would produce warning messages)

To assess goodness of fit, you can look at the distribution of residuals (are they approximately normally distributed ?) and you could use cross-validation.

Best Answer

Related Solutions

Solved – Calculating principal component scores after PC analysis

Solved – Nonlinear principal component analysis MATLAB code

Related Question