[Math] the best way to interpolate over the 25th and 75th percentile of SAT scores

interpolationnormal distributionprobabilityprobability distributionsstatistics

The problem:

I know the 25th and 75th percentiles of SAT scores for students admitted to a given university, and I want to interpolate over those two points in order to estimate all the percentiles (i.e. 1st-100th) of scores for students admitted to the university.

What I know about SAT score distributions:

  1. SAT scores must be in the interval [600, 2400] and are approximately normally distributed on a nationwide basis:

histogram of all 1,547,990 SAT scores taken in 2010

  1. Some extremely competitive universities, such as MIT and Harvard, may have the highest possible SAT score (i.e. a 2400) at their 75th percentile, so I'm guessing their distribution might be truncated on the right side (not sure if this would still be a normal distribution?).

  2. I have a histogram of all 1,547,990 SAT scores taken in 2010 including the mean and standard deviation: http://professionals.collegeboard.com/profdownload/sat-percentile-ranks-composite-cr-m-w-2010.pdf.

Best Answer

Your post is a bit confusing--to me, at least. The original question alone is interesting. Along with just the assumption that scores are normal, it contains enough information to give a good answer. (It is not clear whether items 1-3 are part of the question, or your own research towards an answer. They are also worthwhile information for many purposes, but I will ignore them here--except for normality.)

(a) The mean of a normal distribution is halfway between the 25th and 75th peercentiles (also called lower and upper quartiles). Average these two values to approximate the population mean.

(b) In a normal distribution, the difference between these two percentiles is about 1.35 times its standard deviation. So take the difference between these two values and divide by 1.35 to approximate the population standard deviation.

You can verify both (a) and (b) by looking at printed standard normal tables. Also, you might be able use selected data from your items 1-3 to see that both (a) and (b) work well in practice.

The parameters of the normal family are the mean and standard deviation (SD). Once they are known, the distribution is completely specified. For example, knowing the mean and SD, you could find the proportion of students in the population that score above 2000.

When administered to a large and diverse population, scores of most standardized tests tend to be normal (roughly) because of the Central Limit Theorem. Much of the interpretation of college-entrance exam scores depends on normal distribution and published information about means and various percentiles. Publishers of these exams can fine tune questions and scoring so that the distribution of scores for any one test are even more closely normal than would necessarily be so on theoretical grounds.