I hope you can

help me with a question regarding calculating mean age from grouped census data. If the

age categories used were [0–4], [5–9], [10–14], and [15–19] years,

how would you calculate the midpoints? I initially assumed the midpoints would be 2, 7,

and so on.

However, I read in a worked example that the midpoint should be 2.5 when age range is 0 to 4. I am assuming this has something to do with the babies not actually being

zero years, but I am not exactly sure why the midpoint would be 2.5.

Can anyone assist? Many thanks

## Best Answer

As @Bernd has pointed out, 2.5 really is the midpoint of the 0 to 4 year age group, etc. However, using midpoints at either end of the population distribution introduces bias. For instance, the midpoint of the 80 - 90 year group is approximately 83, because most people in this group are nearer 80 than 90. If this nicety matters (and it perhaps it does, if you are agonizing over a half-year difference), read on.

Demographers make their estimates using various methods of monotonic interpolation. A classic method is Sprague's Formula. This is well described in their literature; for an overview see Hubert Vaughan,

Symmetry in Central Polynomial Interpolation, JIA 80, 1954. This method as published requires equally-spaced age groups but it can be adapted to variable spacings. @Rob Hyndman was the co-author of a nice paper on monotonic splines (Smith, Hyndman, & Wood,Spline Interpolation for Demographic Variables: The Monotonicity Problem,J. Pop. Res. 21 #1, 2004). The paper mentions R code for the "Hyman filter." It is still available on Rob's Web site.Once you have an interpolated age distribution you can compute moments (and any other properties) according to the standard definitions. For instance, the mean is estimated by numerically integrating the age with respect to the distribution.