Variant on Bayes’s Rule and weather forecasting

bayesianprobability

A common application of Bayes's Rule is with weather forecasting, but I have a question mapping this to a real-world situation.

As in a previous StackExchange question, the simple Bayes's problem is usually framed as below:

Marie is getting married tomorrow, at an outdoor ceremony in the
desert. In recent years, it has rained only 5 days each year.
Unfortunately, the weatherman has predicted rain for tomorrow. When it
actually rains, the weatherman correctly forecasts rain 90% of the
time. When it doesn't rain, he incorrectly forecasts rain 10% of the
time. What is the probability that it will rain on the day of Marie's
wedding?

To use common notation with the selected answer in that question, let $R$ denote the event that it actually rains (prior). And $P$ denote the event that the weather forecaster is correct.

We can then apply Baye's to calculate $P(R|P)$.

However, in reality, weather forecasters themselves seem to issue probabilities that it will rain as opposed to a binary response. For example, the weather forecaster says there is a 70% chance it will rain tomorrow (while being correct only 90% of the time, as before).

How does one rationally update their prior $P(R)$ in this case?

Best Answer

It's best to just start with the binary case. Assuming I did the math correctly we have,

$$P(R \mid P) = \cfrac{P(P \mid R) \cdot P(R)}{P(P)}$$ $$ = \cfrac{0.9 \cdot (5/365)}{0.9 \cdot (5/365)+0.1 \cdot (360/365)}$$ $$\Rightarrow P(R \mid P) = 1/9$$

So it would seem even in the binary case your weatherman is a poor predictor. You say,

For example, the weather forecaster says there is a 70% chance it will rain tomorrow (while being correct only 90% of the time, as before)

Say a forecaster is well-calibrated if when they predict a probability $p$ for an event then the event actually occurs with probability $p$. In this case our model is $P(R | p) = p$. We'd be done in that case. However, in the version you want to consider we have a forecast model that is not well-calibrated. Now, to be honest it's not clear to me that you came up with this example in a consistent way so I'm going to kind of just push forward with one possible interpretation.

In that direction let's define a quantity $\rho_n^{\epsilon}(x)$ that will measure how often a positive outcome $y$ occurs when we predict a probability $\epsilon$-close to $x$. We write something like,

$$\rho_{n}^{\epsilon}(x) = \cfrac{\sum_{t = 1}^{n} y_t \cdot 1 \{ q_t \in (x - \epsilon, x + \epsilon) \}}{\sum_{t = 1}^{n} 1 \{ q_t \in (x - \epsilon, x + \epsilon) \}} $$

So then our forecaster is $\epsilon$-calibrated if, for all $x \in [0,1]$ for which where we have data on what happens when we predict $x$ we also have,

$$\text{lim sup}_{n \to \infty} \mid \rho_n^{\epsilon} (x) - x \mid \le \epsilon$$

If our forecaster is $\epsilon$-calibrated for all $\epsilon > 0$ then it is also well-calibrated. This notion is useful because if we can construct forecasters that are $\epsilon$-calibrated then we can construct a well-calibrated forecaster as well.

Now the weatherman in the example we're considering is not well-calibrated. Moreover, your desire is a bit different than the notion I introduced above. We should be looking at metrics like Brier Score. If you want to force the issue we need to know how likely it is for him to predict $x$ on a day with rain and on a day without rain. Say the forecaster is $\epsilon$-calibrated for some non-trivial $\epsilon$. Then we have,

$$x - \epsilon \le P(R \mid x) \le x + \epsilon$$

$$\Rightarrow 0.6 \le P(R \mid 0.7) \le 0.8$$

If we take $\epsilon = 1 - 0.9$. My point I guess is that lower accuracy translates to variance in the prediction. It becomes harder to interpret $x$ as a precise estimate of the probability. It doesn't mean that the probability of rain goes up or down. For that, you need to know much more. I'm kind of curious about the problem. If you provided a better problem statement I'd give it another look. Best!

Best Answer

Related Solutions

[Math] How to formalize this probability exercise

[Math] How to reword this problem illustrating a scenario that needs Bayes Theorem to solve

Related Question