Sample Size for Correlation Analysis – Determine Optimal Sizes for Overall and Sub-Group Analyses

correlationsample-size

I am running Pearson's correlation on an overall sample of 400 respondents.

When I isolate male and female responses, my sample becomes 220 male responses and 180 female responses.

If I further isolate male and female responses by (say) age groups, some sample sizes become as low as 35 responses (for example, for females over the age of 65).

My question: How good are these sample sizes for correlation analysis? (I am looking at the relationship between income levels and overseas travel.)

(I think this has something to do with margin of error but how does this apply to inferential analysis which is based on probability. I can understand its role in descriptive statistics such as results of a political poll).

Best Answer

When it comes to sample size, bigger is better, but we often have to take what we get. With the smaller sample sizes, your estimates of the correlation are going to become extremely noisy, and comparisons between different estimates (which I expect is your primary goal in the subsets analyses) are going to be particularly noisy.

This online tutorial on standard errors (as a pdf) contains formulas for the SE of the correlation coefficient (as well as of the Fisher transformation of the correlation, which is a better scale to be measuring the SE). You'll see that the scales by approximately $1/\sqrt{n}$.

For a correlation of about 0.5, the SE with a sample size of 200 will be about 0.06; with a sample size of 50 it will be about double that.