[Math] Using probability in real situations – do uncertain processes always have a measurable probability distribution

probability theory

I have a somewhat deep question about probability.

I'm a newbie in probability. I've gone through a course which has taught me a lot of different properties of probability and I understood them on a theoretical level. However, when I want to apply what I've learned to real situations, a lot of doubts arise. What I don't quite understand is, for which processes does it make sense to talk about probability in real life? When can a natural phenomenon be modelled using aprobability distribution, and how can such distribution be measured, and when does it not make sense? Is a phenomenon that has multiple outcomes always random and model-able through a probability distribution, or not? What does "random" even mean, when we live in a deterministic world? I feel like I don't know anything about all that.

I'll try to explain my doubts better with some examples.

The classic example given when talking about probability is a coin toss. A coin toss, given a fair coin, is generally thought to be a process in which there is a 50% chance to get heads and 50% chance to get tails. But what happens actually in this process if we look closely? Why do we consider it "random"? Well, when flipping a coin the outcome depends on the interaction of a lot of factors like the starting position of the coin, the speed of the rotations given, the position (height) at which the coin is caught back, and maybe the wind speed and direction if playing outdoors, etc etc. These factors are left uncontrolled and will vary at each toss, so we get different results every time. But, if we had a way to predict these factors before a toss, and we perfectly knew all the rules of interactions between them, we would be able to predict the outcome, it would be deterministic. If left uncontrolled instead, each factor would have its own probability distribution to have a certain "value" at each toss, and on average this results in the 50/50 outcome distribution we're used to see.

But now, say that someone trains all his life to toss a coin always in the same way – he makes sure the starting position, the speed of rotations, etc, are always the same. Or he can even adapt his toss depending on the starting position of the coin, to get the outcome he wants. Say that we, that are playing with him, don't know about this. Now suddenly the probability distribution is no longer 50/50. He gets what he decides 100% of the times. So, what's the difference between one case and the other? How to even understand that one process will obey the rules of probability, as we're accustomed to calculate it, and the other won't, without knowing precisely the mechanics of the process?
One may be tempted to simply say well, one case is truly random, the other is "controlled". By measuring the probability we will notice right away.
But my example took extreme cases. What if we were in an intermediate case? The person cannot replicate his perfect, controlled toss 100% of the times, but 60% of the time? He gets what he wants 60% of the times, and a random outcome the other 40%? The outcomes won't be 50/50 and neither 100/0, but somewhere in between.

But the point is another one. What if he gets better at his perfect toss over time, and the probability distribution changes after each toss? Then it doesn't make sense to talk about a probability distribution, or at least a fixed probability distribution, as this would change after each toss, right? How can we empirically measure the probability if we are in this case?
The thing is, I feel like it's very likely that a lot of "random" processes in nature have an underlying varying factor that we are unaware of, such as in the example above. And I feel like I cannot quite distinguish when a process is truly random, and stable, and thus I can assume the probability distribution being the frequency of the outcomes I observe.

And correct me if I'm wrong, but even the central limit theorem does not help in this case. The central limit theorem roughly says that if you take many many trials from any probability distribution, even if not normal, the probability distribution of the average/sum of the effects will converge to a normal distribution with the same mean and variance as the starting probability distribution. That's great, but this does not work if the probability distribution is different at each trial, right? It must be the same for each trial for the theorem to work.

This was kind of a silly example, but I can think of more significant ones.

Say that I measure the daily frequency at which it rains in a certain city. Once I've got enough data points of rain yes/no on a daily basis, can I affirm that this frequency represents the probability of rain in that city? I've seen a lot of people using probability like that, but I feel like this is wrong, and some things are overlooked in this way. The way I see it, rain is a complex natural phenomenon that is caused by the interaction of a lot of different, natural processes. I'm no expert, but I'd say it has to do with how the air mass moves according to the rules of thermodynamics and aerodynamics, the way the water evaporates depending on temperature, humidity, etc, and, maybe, to a lesser extent, other weird factors like the motion of the Earth and the electromagnetic effects of the other planets in the solar system. If we understood exactly and precisely all these phenoma and had the computing power to calculate all the different interactions, we could probably predict rain anywhere on the planet in a deterministic way, right? But we don't, and that's when we use probability instead. BUT.

What if one of those factors that influence the rain rate changed over time? What if, say, rain is affected of the average atmosphere temperature and the global warming leads to an always increasing rain-rate? Then the frequency of rain we measure now will surely differ from the frequency we will measure in two years from now, and this frequency is surely not a probability (or better yet, the probability distribution would not be constant, and the frequencies we measured so far would be useless). Now, my example talks about global warming, which could be considered a somewhat obvious factor that affects rain rate. But what if the factor that changes invalidating our "probability measure" was more subtle and unknown? Or more drastic, and subject to more sudden changes? What if it was an interaction that we don't even know that exists yet? Then we would think our process is truly random and can be described by a specific probability distribution, but in fact the probability distribution is not stable, and we wouldn't know.

So when can I confidently use a probability distribution, be it measured through frequency or "calculated"/theorized based on the possible outcomes?

Another example that came to my mind was the price changes of stocks. Again this is a complex process that is influenced by the opinions of investors on a company's future outlook, the current economic context, etc. We don't have complete understanding of what happens and thus the outcomes are uncertain. If I measure the daily change in price of a stock on a stock market for a loooong period, can I say that the histogram of that is the probability distribution of the price changes of that stock? At a first glance I'd be tempted to say yes, and be tempted to build a probabilistic investing method that leads to systematic average gain, on large investing periods. But, I still can find reasons why the frequency = probability assumption does not make sense. Say that we measured the histogram of price changes for a given period. In the period we measured the probability, the economy was great and the company did great. Then, suddenly, the economy collapses, or the company goes bankrupt due to some scandal. Then the probability distribution that we measured surely won't contain the datapoint price-100% (on the day of the crash) and 0% for all the days from then on. Again, I took an extreme case, but the changing factor could be more subtle and unrecognizable and could lead to less drastic, unnoticeable changes over time to the probability distribution. So, when does a process truly have a stable probability distribution, which is "measurable", and when not?

Hopefully I managed to explain myself well, English is not my main language. Thanks.

Best Answer

Probability theory explains an idealization of processes in the real world that are rarely (if ever) as clean as the theory suggests. The simplest examples of probability come from games of chance, in which we say that a deck is shuffled, or that a die will show each of its numbers with equal probability, or that a coin flip is equally likely to come up heads or tails.

It is not too hard to show that actually, none of these things are as simple as they seem. Persi Diaconis has a great lecture (The Search for Randomness) that addresses things like coin flips, and he has papers about shuffling that show how real instances of these "randomizing" procedures are not actually straightforward at all.

What is true is that if we assume that something (shuffling a deck, or choosing lottery numbers) behaves like its mathematically ideal version, then sometimes we can make pretty accurate predictions based on the models we have, and that's usually more than we're able to do without the probability theory. This is analogous to a physics student modelling ballistics but ignoring the viscosity of air, or the barometric pressure, or the effect of Rigel's gravity -- it isn't a perfect model, but it's a lot better than no model at all.

The best real-world examples of things that behave exactly like their models are quantum processes. To within our ability to measure it, radioactive decay is a Poisson distribution. When we measure the spin of a particle or the polarity of a photon, we get results (as far as I know, with my limited understanding of physics) that are exactly in agreement with our models of these things.

Generally, though, our mathematical models of processes and events are not quite correct. Instead, they're just the best models we have, and they have the great benefit of being simple and allowing us to make inferences and predictions that we can't make without them.

Related Question