Sample Pearson’s R Covariance and Standard Deviation Missing $\frac{1}{n-1}$

correlationcovariancestandard deviationstatistics

When calculating the Pearson's R for a sample set, the formula is given as:

$r_{xy} = \frac{\sum_{i=1}^n (x_i – \bar{x})(y_i – \bar{y})}{\sqrt{\sum_{i=1}^n (x_i – \bar{x})^2}\sqrt{\sum_{i=1}^n (y_i – \bar{y})^2 }} $

Wikipedia

But for a population we have the following formula:

$\rho_{x,y} = \frac{\text{cov}(X,Y)}{\sigma_x \sigma_y}$

And of course since we are using a sample (not a population), definitions of $\text{cov}(X,Y)$, $\sigma_x$ and $\sigma_y$ include a factor of $\frac{1}{n-1}$.

For example, sample covariance is defined:

$\text{cov}(X,Y) = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i – \bar{y})}{n-1}$

And a similar definition exists for $\sigma_x$ and $\sigma_y$.

I'm sure my algebra is missing a step somewhere. My question is:

In the defenition of Pearson's R for a sample set, where did the $\frac{1}{n-1}$ go?

Best Answer

For the sample, if you define$$\text{cov}(X,Y)=\frac{\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y)}{n-1}$$then the sample variance for either samples is$$\sigma_x^2=\text{cov(X,X)}=\frac{\sum_{i=1}^n(x_i-\bar x)^2}{n-1}\\\sigma_y^2=\text{cov(Y,Y)}=\frac{\sum_{i=1}^n(y_i-\bar y)^2}{n-1}$$and thus$$\rho_{xy}=\frac{\frac{\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y)}{n-1}}{\sqrt{\frac{\sum_{i=1}^n(x_i-\bar x)^2}{n-1}}\sqrt{\frac{\sum_{i=1}^n(y_i-\bar y)^2}{n-1}}}$$and you can see the denominators still cancel off, giving the required expression for $r_{xy}$.

Related Question