Solved – Statistical analysis options on small datasets

correlationsmall-sample

first of all please excuse my poor knowledge of statistics I'm currently teaching myself and have no training in it.

To test correlation I have created a data set that indicates by how much 5 cities have physically grown in years 1990, 1995, 2000, 2005, 2010 and 2015 along with the population estimates for the respective years for each city.
My aim was to see what the level of correlation is using these 6 observations for each city.

But with a $n=6$, I'm not sure if I can even do anything meaningful. Does the rule of $n>30$ always apply? If I can't do a meaningful correlation test what can I do with data like this?

Again sorry for my ignorance and lack of knowledge.

Best Answer

There are two potential issues with the size of your data: achieving statistical significance and finding a practically significant result.

In this particular case, statistical significance means showing that correlation you see between size and population is larger than might be expected by chance alone. If these two variables are strongly correlated, an appropriate test might be able to reject the null hypothesis ($\rho = 0$). If the relationship is weaker, however, you may need a larger sample size to detect it. Regardless, I'd ignore the $n>30$ bit; this is a rough heuristic for when one can assume a sample is normally distributed, but it's neither a particularly good rule in general or applicable to this particular situation. Instead, I'd consider performing a power calculation to determine how many samples you need, based on how big of a correlation you need to be able to detect (presumably a correlation of 0.00001 is essentially uninteresting, but something like 0.25 might be)

As for practical significance...your eventual goal is to use these data to convince someone of something. This might be difficult using data from six cities measured at six time points. If I were interested in demographics generally, I might be worried that there is something unrepresentative about the data from these six cities and that the correlation coefficient from your sample might not provide much information about other situations. Similar concerns would exist about having only six time points. On the other hand, if there is something special about these six towns--perhaps you're advising the mayor of a 7th nearby town--then this might be all the data you need, assuming it's enough demonstrate the effect.

Best Answer

Related Solutions

Solved – What are good techniques for modeling small datasets

Solved – Cross-validation on really small datasets

Related Question