# Calculating R Squared on Subsets – Reference Guide for Regression Analysis

r-squaredreferencesregression

I've been looking for a method to calculate $$R^2$$ on a subset of the samples (a subset of the instances, not a subset of features), and found this answer from Dave. It suggests using the mean of the original samples ($$\bar{y}$$), rather than the mean of the subset, when calculating the TSS – .i.e.:

$$R^2=\dfrac{\sum_j (y_j – \bar{y})^2 – \sum_j (y_j – \hat{y})^2}{\sum_j (y_j – \bar{y})^2}$$.

Using this method has resolved the problem I'm having when using the subset mean to calculate $$R^2$$, where I get a low or negative value if my subset has very low variance (e.g. if the subset is instances with target values in a small range), and I'd like to use Dave's method in a paper I'm writing.

I've searched for academic references for this method of calculating $$R^2$$ but I have not found one so far.

Does anyone know of a suitable reference to use for this?

In my paper I justified this approach with the statement "When $$R^2$$ is calculated for a subset of samples, $$\bar{y}$$ is the mean of the measured LFMC for the full sample set, not the mean of the subset. By using this value for the mean, all $$R^2$$ calculations are compared to the same baseline, thus allowing comparisons between $$R^2$$ for different subsets of samples."