Apologies if the question is phrased incorrectly, i've struggled to understand the right terminology which would have allowed me to research this myself.
I have a series of probabilities in a table, where the probability of an event happening three times is less than the probability of it happening twice.
I have historical data of predictions showing the predicted probability of an event happening two and three times. I then have more predictions which only show the probability of it happening twice, but are missing the predictions for the event happening three times.
The complete data that had predictions for the event happening both two and three times appears to increase in a fairly logical pattern and I think I should be able to estimate what the probability of the event happening three times is where that data is missing.
two_events | three_events | diff | percent_increase |
---|---|---|---|
20.78 | 5.42 | 15.36 | 73.917228 |
21.14 | 5.58 | 15.56 | 73.604541 |
23.78 | 6.76 | 17.02 | 71.572750 |
23.96 | 6.89 | 17.07 | 71.243740 |
24.98 | 7.32 | 17.66 | 70.696557 |
24.99 | 7.32 | 17.67 | 70.708283 |
25.22 | 7.43 | 17.79 | 70.539255 |
26.65 | 8.14 | 18.51 | 69.455910 |
28.02 | 8.85 | 19.17 | 68.415418 |
29.57 | 9.67 | 19.90 | 67.297937 |
30.94 | 10.42 | 20.52 | 66.321913 |
31.05 | 10.49 | 20.56 | 66.215781 |
31.29 | 10.62 | 20.67 | 66.059444 |
32.00 | 11.03 | 20.97 | 65.531250 |
32.71 | 11.44 | 21.27 | 65.025986 |
34.21 | 12.33 | 21.88 | 63.957907 |
34.31 | 12.39 | 21.92 | 63.888079 |
35.56 | 13.17 | 22.39 | 62.964004 |
36.13 | 13.52 | 22.61 | 62.579574 |
36.46 | 13.74 | 22.72 | 62.314866 |
36.58 | 13.81 | 22.77 | 62.247130 |
38.86 | 15.31 | 23.55 | 60.602162 |
39.97 | 16.06 | 23.91 | 59.819865 |
41.62 | 17.22 | 24.40 | 58.625661 |
41.83 | 17.37 | 24.46 | 58.474779 |
42.95 | 18.22 | 24.73 | 57.578580 |
45.15 | 19.35 | 25.80 | 57.142857 |
46.33 | 20.73 | 25.60 | 55.255774 |
48.45 | 22.42 | 26.03 | 53.725490 |
48.52 | 22.49 | 26.03 | 53.647980 |
49.80 | 23.54 | 26.26 | 52.730924 |
50.78 | 24.37 | 26.41 | 52.008665 |
52.55 | 25.90 | 26.65 | 50.713606 |
53.11 | 26.40 | 26.71 | 50.291847 |
61.85 | 34.84 | 27.01 | 43.670170 |
62.34 | 35.36 | 26.98 | 43.278794 |
64.98 | 38.22 | 26.76 | 41.181902 |
69.28 | 43.17 | 26.11 | 37.687644 |
72.29 | 46.91 | 25.38 | 35.108590 |
75.13 | 50.63 | 24.50 | 32.610142 |
I think I should be able to use this to predict that if the probability of two events happening is 21%, then the probability of three events happening is going to be roughly 5.50% for example.
two_events | three_events | diff | percent_increase |
---|---|---|---|
20.78 | 5.42 | 15.36 | 73.917228 |
21 (example) | ??? (presumably 5.50 or 5.51) | ??? | ??? |
21.14 | 5.58 | 15.56 | 73.604541 |
To be clear the first column is not sequential in anyway, I am only looking at predicting what the second column would be based on the first.
It also does not need to be a perfect formula, only roughly correct
But I do not know how to come up with a formula to calculate this or the right terminology to use to search for it.
Thanks to a great link in the answers i've been given the following formula:
$$y=a(x+b)^d+c$$
With these variables:
a = 0.00000154534
b = 75.0975
c = 3.47686
d = 6.59594
But I am not sure how to write that out in something like Excel or code such as Python, Javascript or SQL.
If I use
((0.00000154534(21+75.0975))^3.47686)+6.59594
to work out the value of 21, which should be 5.50, they all result in an error and seem to require something between a(x+b)
.
I assume there is a function for whatever a(x+b)
is doing that you would have to use in Excel or similar, but i'm afraid I have no idea what it is called and have found it impossible to Google.
Confusingly typing ((0.00000154534(21+75.0975))^3.47686)+6.59594
into Google comes out with 6.59594
so i'm clearly making a basic error somewhere.
Could you please provide an example either using an Excel formula or code?
Best Answer
Welcome to MSE!
What you're describing sounds like you're looking to interpolate or extrapolate from this data to other possible points.
You have two possibilities. In both cases, though, the difference $P_2-P_3$ isn't all that useful. Nor is the increase from one data point to the next. What you'd really like is to find a function such that $P_3=f(P_2)$.
Since this data set looks relatively smooth, you could just do two-point interpolation. In other words, if you have a $P_2$ within the bounds of the dataset, take the points above and below it, and pretend the line they define is the function (for this part of the data). You don't even really have to "graph" it, just find the formula of the line and plug in your $P_2$ value.
Sadly, that only works for interpolation. If you need something outside the bounds of the dataset, you'll have to do more.
Better than two-point interpolation is to use a curve-fitting program. There are several places online to do a curve-fit; personally I like Desmos because it plays well, though sometimes it doesn't like copy/paste. You also need some guess as to the form of the equation for Desmos--other systems likely do, Excel being an example.
This looks approximately quadratic, at least in the middle of its range. Possibly a different exponent would be useful. Letting the $x$-axis be $P_2$ and the $y$-axis be $P_3$, I'd try fitting to the forms: $$y=ax^2+c$$ or more complex: $$y=a(x-b)^d+c$$ to account for a possible offset on the $x$-axis.
Good luck!
(I say approximately quadratic because, for instance, $16=40\% \cdot 40$, and $24 \approx 50\%\cdot 50$.)