Calculating the accuracy of an estimate when the true value is unkown

confidence intervaldistributionsestimationprobabilitystandard error

This is the context for the problem I've run into:

A program starts counting down in seconds, choosing a random value
with a min of 300 and a max of 600. At the end of the countdown event
A occurs, a new countdown is chosen, and the cycle repeats. I can
measure the amount of times event A occurs, but not the time between
occurrences or the total runtime of the program.

My goal is to estimate the total runtime and measure the accuracy of this estimate. With the absence of other variables I've assumed the average time between events to be 7.5 minutes. If I multiply this by the number of occurrences I believe I'll have a good estimate of the total runtime, but I'm unsure how I'd represent the estimate's accuracy.

My first thought was to find a margin of error for the average rate of occurrence, then multiply it by the number of occurrences to provide a reasonable range for the original estimate's accuracy. This may not be the best method, and after a few hours looking online I've only gotten more confused. I'm aware accuracy will be depend on the number of occurrences, and I have a general understanding of concepts like normal distribution curves, standard distribution, and confidence intervals, but it’s been a few years since I’ve taken a statistics class and I am unsure how to apply these. Most of what I find online references population or something small like a coinflip.

Edit: Changed minutes to seconds and revised the context to be clearer.

Best Answer

If, after the occurrence of the event, the "clock" for the next starts, and the arrival time between events is independent, then you have a Renewal Process.

Let $t$ be the unknown time passed, $N(t)$ the number of occurrences prior to time $t$, which you observe, $\mu$ the expected time between events, and $\sigma^2$ is the variance of time between events. For your problem, we have $\mu = 7.5$ and $\sigma^2 = \frac{25}{12}$.

You can estimate the number of by computing $\frac{N(t)}{7.5}$, but this value alone lacks variability. It is better to also provide a confidence interval. In what follows, we will see one way to obtain an approximate interval, give you a general formula, and an example.

Obtaining an approximate confidence interval

If you look at the asymptotic part in wikipedia (actually, you will need to consult the Grimmet reference, wikipedia is missing information), it states that

$$ f(t) = \frac{N(t) - t/\mu}{\sqrt{t\sigma^2/\mu^3}} \overset{D}{\longrightarrow} \mathcal{N}(0, 1), \quad \mbox{as}\quad t \longrightarrow \infty\quad.$$

From this, assuming enough time has passed, we can construct a $1-\alpha$ confidence interval for $t$, the time passed.

For simplification, write $N(t) = n$, $a = \frac{n\mu^{3/2}}{(\sigma^2)^{1/2}}$ and $b = \left(\frac{\mu}{\sigma^2}\right)^{1/2}$. Then we get

$$ f(t) = \frac{a}{\sqrt{t}} - b\sqrt{t} \quad.$$

If $z_{1-\alpha/2}$ is the $1-\alpha/2$ quantile for the normal, then the theorem asserts that $$\mathbb{P}(-z_{1-\alpha/2} \leq f(t) \leq z_{1-\alpha/2}) \approx 1-\alpha \quad.$$

To get a CI for $t$, we observe that $f$ has a inverse function given by

$$g(t) = \left(\frac{-t+\sqrt{t^2 + 4ab}}{2b}\right)^2 \quad.$$

Applying $g$ on all members and inverting the inequalities since $g$ is decreasing, we have

$$\mathbb{P}(g(z_{1-\alpha/2}) \leq t \leq g(-z_{1-\alpha/2}) ) \approx 1-\alpha \quad.$$

Therefore, the CI is given by $[g(z_{1-\alpha/2}), g(-z_{1-\alpha/2)})]$.

Substituting the values, the final formula for the approximate confidence interval of level $1-\alpha$ is

$$ \left[\left(\frac{-z_{1-\alpha/2}+\sqrt{z_{1-\alpha/2}^2 + 4ab}}{2b}\right)^2, \left(\frac{z_{1-\alpha/2}+\sqrt{z_{1-\alpha/2}^2 + 4ab }}{2b}\right)^2 \right] \quad. $$

I know the formula seems massive, but now you only need to plug the values.

Simple examples

Assume you observed $n = 26$ occurrences. How much time has passed? Well, your point estimation gives you $\hat{t} = n\times 7.5 = 195$. If you plug the values for $a$, $b$, $n$ and consider a $95$% confidence interval, you get the confidence interval $[181.1, 210]$.

If $n = 100$, then your estimate would be $\hat{t} = 750$, and the confidence interval would be $[722.2, 778.8]$.

Finally, a graphic displaying how the confidence interval varies with the number of occurrences. The black line shows the estimated time, and the red dashed lines are the confidence interval endpoints.

enter image description here

Related Question