Solved – True probability vs estimated probability

estimationintuitionmaximum likelihoodprobability

Is it correct to think that the true probability of an event cannot be ever known?

When studying probability, in the first lectures, there are those typical exercises which start with sentences like: "One tosses a (fair) coin." or "In a bag there are 4 blue marbles and 6 red marbles.". In the first case we usually say that $P($heads$)=0.5$ and in the latter: $P($blue marble$) = 0.4$.

After a few lectures, there comes the notion of "maximum likelihood estimation" (MLE) and one can observe that in the second case above (with marbles) we just fit a Bernoulli distribution to the data we had (10 marbles: BBBB RRRRRR).

I don't know how to think about the first example above (with coin tossing). But I have the following intuition (which would help me a lot I guess) and I hope you can help me to understand if I am correct or not:

  1. The true probability of an event cannot be ever known.

  2. In the first case (with coin tossing) the probability was estimated from context/text. Can we say that I applied the closed world assumption?

  3. In the second case (with marbles) the probability was estimated from data (using MLE).

EDIT:

As a conclusion, the probability cannot be known, but can be assumed to be some number. My two examples were not examples of estimating probabilities (because there was NO data: I incorrectly said that the marbles were data…), but of making someone to think of a specific assumption: equally likely outcomes assumption.

Best Answer

The true probability of an event cannot be ever known.

Right and wrong, depending on the assumption / the point in time. Since a binary valued random variable is in any case Bernoulli distributed and you do not have a choice of how to model that, let us move away from that example and consider instead the following: We wonder whether or not the following data is normally distributed and if so, what mean it has:

-0.33,1.4,0.64,-0.11,0.51,0.4,1.66,0.28,0.51,0.35,-0.38,0.1,1.64,-0.88,0.12,1.36,-0.23,-1.05,-0.87,-0.39

Right now you have the choice of modelling that with a t-distribution, a normal distribution and so on. Furthermore it is data from the real world, hence yes: we can never be absolutely, 100% sure that this data even was produced from a random variable. Maybe all the concepts of probability do not apply here because there is simply no rule behind how this data was generated and maybe the next number that this process would have generated is 100001344.99 ( a number that would not fit at all into the pattern). But the true question here is: does it really matter? The anser is no: We simply try to model this data with different distribution and 'do the best we can'. In the end (in the real world) we want to make optimize something, reduce costs, reduce waste or so. So, if we were able to do that by using a (maybe somewhat inadequate) model, do we care as long as we can make "good money" out of it? I highly doubt that :-)

On the other hand, once you have selected a model (and therefore explicitly assumed that the data was in fact generated by a random variable and that this random variable in fact has a normal distribution with some unknown parameters) then you can compute everything you want (like $P[X > 0]$) explicitly!

On questions 2) and 3):

We always get insights about the data by using two sources:

1) context / business knowledge / experience from the past

2) the data

We use 1) to select the model and then we use 2) to adjust the parameters of the model. Examples for 1):

  • Any random variable with only two outcomes is Bernoulli distributed.
  • We know that the data is the hapiness index of different people. We know from earlier experiments that most of the people are "neutrally happy" (i.e. data gathers around 0) and that only very few are extraordinarily happy (little data that is very positive) and only very few are extraordinarily unhappy (little data that is very negative).

Clearly, the coice of the model influences what you get out of the experiment and if everybody uses the normal distribution because, hey, everybody before me did that so I am doing it as well, then we may keep ourselves from getting valuable new insights (by trying new distributions/models maybe?).

Related Question