Solved – Coefficient of determination ($R^2$) and sample size

r-squared

Is there any relationship between $R^2$ and sample size – does the $R^2$ increase with sample size? And does the adjusted $R^2$?

Best Answer

It depends on whether you are interested in $r^2$, the sample correlation coefficient, or the $R^2$ multiple correlation coefficient, used to assess the performance of regressions.

Both $r^2$ and adjusted $r^2$ are negatively biased--that is, the sample values are slightly smaller than the corresponding population value--but the adjusted formula is somewhat less biased. In addition to the sample size, the amount of bias depends on the value, with $r^2$ near zero and one showing the least bias and those near 0.6-0.8 showing the most bias.

Table 1 of a paper by Zimmerman, Zumbo, and Williams (2003) illustrates the bias as a function of sample size and correlation value. Elsewhere in the paper, they show simulation data indicating that the Fisher and Olkin and Pratt adjusted $r^2$ reduce this bias considerably.

There is also a decent amount of work looking at "$R^2$ shrinkage", which is a related phenomena that comes up a lot in regression-related contexts, but has the opposite sign (it is positively-biased, and adjustments bring it back down). Yin and Fan (2001) have a fairly comprehensive comparison of methods for estimating it, and Page 3/205 has some citations to descriptions of the problem.

Finally, you should be aware that there are lots of methods for adjusting $r^2$/$R^2$ (in fact, there are even multiple ($\ge3$) versions of the Olkin and Pratt adjustment formula floating around, some of which correct for the number of parameters), so it might help to be more specific about whatever you have in mind