Solved – Validation metrics (R2 and Q2) for Partial Least Squares (PLS) Regression

cross-validationpartial least squarespythonregressionscikit learn

I'm attempting to validate my Partial Least Squares (PLS) -regression model. From documentation and other readings regarding PLS regression I've come to understand that there are generally two metrics used to evaluate the performance of the algorithm. $R^2$ is calculated as 1 – residual sum of squares(RSS) and the total sum of squares(TSS):

$$
R^2 = 1 – RSS/TSS
$$
$$
RSS =\sum(y-\hat{\mathbf{y}})^2
$$
$$
\ TSS = \sum(y – \bar{\mathbf{y}})^2
$$
While $Q^2$ is calculated as 1 – Predictive residual Error sum of squares(PRESS)/ TSS:
$$
\ Q^2 = 1 – PRESS/TSS
$$
$$
\ PRESS = \sum(y-\hat{\mathbf{y}})^2
$$

The Calculation for $R^2$ and $Q^2$ are almost identical, with the only difference being that RSS is calculated from the data on which the algorithm is trained and PRESS is calculated from held out data.

My question:

In the view of training/test splits of data, is it appropriate to call $R^2$ a metric of how the algorithm fits the training data and $Q^2$ a metric of algorithm performance on test data?

Side question: Is it good practice to scale Y in the same manner as X in PLS regression?

Best Answer

I was also looking for information on these parameters and found a good explanation in the book Eriksson et al. Multi- and Metavariate Data Analysis Principles and Applications.

In general, I think you have the right idea. According to Eriksson et al, the fit tells us how well we are able to mathematically reproduce the data of the training set. The $R^2$ parameter is known as the "goodness of fit", or explained variation. The $Q^2$ parameter is termed "goodness of prediction", or predicted variation.

The following points are emphasised:

  • In PLS, the terms $R^2$ and $Q^2$ generally refer to the model performance of the Y-data, the responses, rather than that of the X-data, the predictors.
  • The two parameters vary differently with increasing model complexity. $R^2$ is inflationary and rapidly approaches unity as model complexity (number of model parameters) increases. Therefore, it is not sufficient only to have a high $R^2$. $Q^2$, on the other hand, is not inflationary and at a certain degree of complexity will not improve any further and then degrade.
  • There is a trade off between fit and predictive ability, so it is the zone where we have a balance between good fit and predictive power that we wish to identify.

For your side question, I find no specific recommendations and no particular reason to scale the Y variable (I'm assuming there is only one). X-variables are scaled to give them the same variance and thus equal weight in the model. The model should be mathematically equivalent whether the response is scaled or not. If there is more than one Y-variable, a more important issue would be to test whether they correlated and whether to fit one model or separate models for each predictor.