[Math] Optimal sample size

statistics

What's the simplest formula to calculate the optimal sample size from a population size? I've been reading a little about it and they ask me stuff like confidence level or confidence interval which I have no idea about. I need to test some HTML elements in web pages to get a better idea about general properties of HTML elements, and I have no idea about he results I'll get. So I just need the most neutral possible version of the sample size calculating formula when all I know is the population size.

Best Answer

It depends on your population and the confidence that you need in your results. Typically, you should know something about what you are trying to achieve. If you know your confidence level, confidence interval, and population size, you can choose a sample size that will give you the properly confidence.

Your confidence level, typically around 90% (95% and 99% are also commonly used), tells you how sure you will be that your entire population reflects your results. The confidence interval describes the range of values that you are sure about within your confidence level.

Wikipedia provides an overview of sample size methodologies. That might get you started. But unless you know how sure you want to be of your results, you can't determine a sample size. Wikipedia also provides decent definitions of confidence level and confidence interval.

From a statistical standpoint, if you don't have clearly defined questions, you really can't analyze your data, at least using statistical methods. Perhaps you should review data and try to formulate questions, or take an analytical instead of statistical approach to solving the problem/answering your question.


Assuming a normal distribution, you can use the equation $n \geq \left(\dfrac{z_{\alpha/2} \sigma}{\delta}\right)^2$ where z is found in a table, σ is your standard deviation (which you can use either a sample standard deviation or a population standard deviation, depending on what you know), and δ is your confidence level. The value for n is your population - be sure to round up if you get a fractional n.

Note that the z-value is based on your confidence level. The value for α is a value between 1 - the decimal format of your confidence level (for a confidence level of 95%, α would be 1 - 0.95, or 0.05).

You might also be interested in the answers to a few questions on the Statistical Analysis Stack Exchange:

Related Question