[Math] Calculating average value of points over a period of time

average

I have several values plotted over a period of time on a line graph. The length of time between values is not uniform (i.e. one value could be 3 days after the one before, which itself could be 2 weeks after that one).

Visualised as a line graph, it would look like this:

Values plotted over time

How would you calculate the average value over the entire period of time? – obviously taking into account the increases and decreases between points and the length of time between points. Is this possible? (I may be being blind to an obvious solution…)

Best Answer

It depends on what the values represent.

$\underline{\text{Example 1}}$

If the value is "the number of people in my neighborhood", then the best approach is to integrate the above plot and divide by the total time.

We know that people arrive and leave somewhat unpredictably. Also that sometimes it's one person, and sometimes it's a family of several. If this neighborhood isn't a college town, then there generally won't be a seasonal pattern.

For example, say we want to estimate how many people there were on July 25. We know there were 75 on June 20, and 55 on August 29, which is a decrease of 20 people in 70 days. The best estimate we can make is to assume that one person left every 3.5 days, so on July 25, there would have been 65 people. This is purely a guess, but it is the best estimate we have available. Knowing how many people were present in April or October won't improve this estimate.

Thus, the linear plot represents our best guess for the number of people present each day. So the average is the area under the curve divided by the time.

$\underline{\text{Example 2}}$

If the value is "the number of people who died from X in NY" (where X is an instantaneous, non-contagious, non-seasonal effect like "stroke"), then the numbers are completely independent. Knowing how many died on October 17 and on October 19 tells us absolutely nothing about how many died on October 18. In this case, the best estimate we can make for the average is to sum the values for the days we have data on, then divide by the number of data points.

$\underline{\text{Example 3}}$

Other effects like temperature and amount of rainfall can be seasonal, so you would expect perhaps a sinusoidal variation about the average. In that case, fitting to a curve would seem the best approach.

$\underline{\text{Caveat}}$

These estimates suffer from possible sample bias. In the first example, extra significance is given to values that are far apart in time. Nearly half the plot is connected to the single June datapoint. Moving that one point up by 4 would raise the average by 1, which makes that point count as 3 times more significant than the average for all points.

In the third example, a single hurricane in the second half of October could significantly affect a full $1/3$ of the data sample ($4$ out of $12$ data points). Thus, a single weather phenomenon could skew the results.

So, to reiterate: the best approach to calculating an average depends highly on what the values represent.

Related Question