Let us say I have a number of events per week over some time periods. During each week a proportion of these events are important (important_events_per_week/number_of_events_per_week). The number_of_events_per_week also vary over time. How does one calculate the average important number of events per week over time please? I would think it is too naive to simply calculate the proportion per week and than take the arithmetic mean, so I thought I ask proper mathematicians? Should one use a weighted average to account for the varying number of events? Thanks!
Average proportions over time
averagediscrete timestatisticstime series
Related Solutions
Your original wrong answer might be illustrated by
- the question of the average speed of a person who travels at 20 seconds per kilometre for 50 seconds and then at 8 seconds per kilometre for another 100 seconds,
compared with
- the question of the average speed person who travels at 20 seconds per kilometre for 50 kilometres and then at 8 seconds per kilometre for another 100 kilometres.
Average speed is total distance divided by total time so:
For the first person the average speed is $\dfrac{\frac{50}{20} + \frac{100}{8}}{50 + 100} = \frac{1}{10}$ kilometres per second.
For the second person the average speed is $\dfrac{50 + 100}{50 \times 20 + 100 \times 8} = \frac{1}{12}$ kilometres per second.
Multiply both answers by $2400$ seconds per week if it helps.
In your original question you were asked for the second method (the weights given are items not times) but initially answered using the first.
It depends on what the values represent.
$\underline{\text{Example 1}}$
If the value is "the number of people in my neighborhood", then the best approach is to integrate the above plot and divide by the total time.
We know that people arrive and leave somewhat unpredictably. Also that sometimes it's one person, and sometimes it's a family of several. If this neighborhood isn't a college town, then there generally won't be a seasonal pattern.
For example, say we want to estimate how many people there were on July 25. We know there were 75 on June 20, and 55 on August 29, which is a decrease of 20 people in 70 days. The best estimate we can make is to assume that one person left every 3.5 days, so on July 25, there would have been 65 people. This is purely a guess, but it is the best estimate we have available. Knowing how many people were present in April or October won't improve this estimate.
Thus, the linear plot represents our best guess for the number of people present each day. So the average is the area under the curve divided by the time.
$\underline{\text{Example 2}}$
If the value is "the number of people who died from X in NY" (where X is an instantaneous, non-contagious, non-seasonal effect like "stroke"), then the numbers are completely independent. Knowing how many died on October 17 and on October 19 tells us absolutely nothing about how many died on October 18. In this case, the best estimate we can make for the average is to sum the values for the days we have data on, then divide by the number of data points.
$\underline{\text{Example 3}}$
Other effects like temperature and amount of rainfall can be seasonal, so you would expect perhaps a sinusoidal variation about the average. In that case, fitting to a curve would seem the best approach.
$\underline{\text{Caveat}}$
These estimates suffer from possible sample bias. In the first example, extra significance is given to values that are far apart in time. Nearly half the plot is connected to the single June datapoint. Moving that one point up by 4 would raise the average by 1, which makes that point count as 3 times more significant than the average for all points.
In the third example, a single hurricane in the second half of October could significantly affect a full $1/3$ of the data sample ($4$ out of $12$ data points). Thus, a single weather phenomenon could skew the results.
So, to reiterate: the best approach to calculating an average depends highly on what the values represent.
Best Answer
If you want the average proportion of the number of interesting events to the number of total events over some period of multiple weeks, one simple method is to add up the total number of events during that period (which you can do because you have the total number of events in each week of the period), then add up the total number of interesting events during that period (which you can do because you have the total number of interesting events in each week of the period). Finally, divide the total number of interesting events by the total number of events.
This is how many such averages are computed in real life. For example, to find the average speed for a trip of $20$ miles, you don't take the speed on each mile and do some kind of fancy weighted average of those $20$ speeds to find the average speed, you just take the total trip time and divide by the total distance (which is $20$ miles).