Likelihood Definition – Clarifying Ambiguities in the Wikipedia Entry on Likelihood

bayesianconditional probabilitydefinitionlikelihoodprobability

I have a simple question regarding "conditional probability" and "Likelihood". (I have already surveyed this question here but to no avail.)

It starts from the Wikipedia page on likelihood. They say this:

The likelihood of a set of parameter values, $\theta$, given
outcomes $x$, is equal to the probability of those observed outcomes
given those parameter values, that is

$$\mathcal{L}(\theta \mid x) = P(x \mid \theta)$$

Great! So in English, I read this as: "The likelihood of parameters equaling theta, given data X = x, (the left-hand-side), is equal to the probability of the data X being equal to x, given that the parameters are equal to theta". (Bold is mine for emphasis).

However, no less than 3 lines later on the same page, the Wikipedia entry then goes on to say:

Let $X$ be a random variable with a discrete probability distribution
$p$ depending on a parameter $\theta$. Then the function

$$\mathcal{L}(\theta \mid x) = p_\theta (x) = P_\theta (X=x), \, $$

considered as a function of $\theta$, is called the likelihood
function (of $\theta$, given the outcome $x$ of the random variable
$X$). Sometimes the probability of the value $x$ of $X$ for the
parameter value $\theta$ is written as $P(X=x\mid\theta)$; often
written as $P(X=x;\theta)$
to emphasize that this differs from
$\mathcal{L}(\theta \mid x) $ which is not a conditional probability
,
because $\theta$ is a parameter and not a random variable.

(Bold is mine for emphasis). So, in the first quote, we are literally told about a conditional probability of $P(x\mid\theta)$, but immediately afterwards, we are told that this is actually NOT a conditional probability, and should be in fact written as $P(X = x; \theta)$?

So, which one is is? Does the likelihood actually connote a conditional probability ala the first quote? Or does it connote a simple probability ala the second quote?

EDIT:

Based on all the helpful and insightful answers I have received thus far, I have summarized my question – and my understanding thus far as so:

  • In English, we say that: "The likelihood is a function of parameters, GIVEN the observed data." In math, we write it as: $L(\mathbf{\Theta}= \theta \mid \mathbf{X}=x)$.
  • The likelihood is not a probability.
  • The likelihood is not a probability distribution.
  • The likelihood is not a probability mass.
  • The likelihood is however, in English: "A product of probability distributions, (continuous case), or a product of probability masses, (discrete case), at where $\mathbf{X} = x$, and parameterized by $\mathbf{\Theta}= \theta$." In math, we then write it as such: $L(\mathbf{\Theta}= \theta \mid \mathbf{X}=x) = f(\mathbf{X}=x ; \mathbf{\Theta}= \theta) $ (continuous case, where $f$ is a PDF), and as
    $L(\mathbf{\Theta}= \theta \mid \mathbf{X}=x) = P(\mathbf{X}=x ; \mathbf{\Theta}= \theta) $ (discrete case, where $P$ is a probability mass). The takeaway here is that at no point here whatsoever is a conditional probability coming into play at all.
  • In Bayes theorem, we have: $P(\mathbf{\Theta}= \theta \mid \mathbf{X}=x) = \frac{P(\mathbf{X}=x \mid \mathbf{\Theta}= \theta) \ P(\mathbf{\Theta}= \theta)}{P(\mathbf{X}=x)}$. Colloquially, we are told that "$P(\mathbf{X}=x \mid \mathbf{\Theta}= \theta)$ is a likelihood", however, this is not true, since $\mathbf{\Theta}$ might be an actual random variable. Therefore, what we can correctly say however, is that this term $P(\mathbf{X}=x \mid \mathbf{\Theta}= \theta)$ is simply "similar" to a likelihood. (?) [On this I am not sure.]

EDIT II:

Based on @amoebas answer, I have drawn his last comment. I think it's quite elucidating, and I think it clears up the main contention I was having. (Comments on the image).

enter image description here

EDIT III:

I extended @amoebas comments to the Bayesian case just now as well:

enter image description here

Best Answer

I think this is largely unnecessary splitting hairs.

Conditional probability $P(x\mid y)\equiv P(X=x \mid Y=y)$ of $x$ given $y$ is defined for two random variables $X$ and $Y$ taking values $x$ and $y$. But we can also talk about probability $P(x\mid\theta)$ of $x$ given $\theta$ where $\theta$ is not a random variable but a parameter.

Note that in both cases the same term "given" and the same notation $P(\cdot\mid\cdot)$ can be used. There is no need to invent different notations. Moreover, what is called "parameter" and what is called "random variable" can depend on your philosophy, but the math does not change.

The first quote from Wikipedia states that $\mathcal{L}(\theta \mid x) = P(x \mid \theta)$ by definition. Here it is assumed that $\theta$ is a parameter. The second quote says that $\mathcal{L}(\theta \mid x)$ is not a conditional probability. This means that it is not a conditional probability of $\theta$ given $x$; and indeed it cannot be, because $\theta$ is assumed to be a parameter here.

In the context of Bayes theorem $$P(a\mid b)=\frac{P(b\mid a)P(a)}{P(b)},$$ both $a$ and $b$ are random variables. But we can still call $P(b\mid a)$ "likelihood" (of $a$), and now it is also a bona fide conditional probability (of $b$). This terminology is standard in Bayesian statistics. Nobody says it is something "similar" to the likelihood; people simply call it the likelihood.

Note 1: In the last paragraph, $P(b\mid a)$ is obviously a conditional probability of $b$. As a likelihood $\mathcal L(a\mid b)$ it is seen as a function of $a$; but it is not a probability distribution (or conditional probability) of $a$! Its integral over $a$ does not necessarily equal $1$. (Whereas its integral over $b$ does.)

Note 2: Sometimes likelihood is defined up to an arbitrary proportionality constant, as emphasized by @MichaelLew (because most of the time people are interested in likelihood ratios). This can be useful, but is not always done and is not essential.


See also What is the difference between "likelihood" and "probability"? and in particular @whuber's answer there.

I fully agree with @Tim's answer in this thread too (+1).

Related Question