Notation – What Does $\propto$ Mean?

notation

From this article:

$\ldots$a maximum-a-postiori "a maximum-a-posteriori
$(MAP_{x,k}^{\,\,\,\,\,1})$ estimation, seeking a pair $(\hat{x}, \hat{k})$ maximizing:
$$p(x, k\mid y) \propto p(y\mid x, k)p(x)p(k).$$

What does the symbol '∝' mean in this context?

Best Answer

It means proportional as a function of two variables $x$ and $k$ with $y$ fixed.

You have a prior probability density as a function of $x$ and $k$, which is just a product of a function of $x$ and a function of $k$, so that the random variables involved are independent in the prior distribution. Then new data arrives: a random variable is observed to be equal to a number $y$. The conditional density of that random variable given that the first two were equal to $x$ and $k$, is the first factor in the expression on the right.

The expression on the left is the conditional density of the random variables corresponding to $x$ and $k$ given that observed value equal to $y$.

Since it means proportional as a function of $x$ and $k$ with $y$ fixed, you need to multiply it by a normalizing constant that may depend on $y$ but does not depend on $x$ and $k$, in order to make it a probability density function, as a function of $x$ and $k$. "Constant" in this case would mean not depending on $x$ and $k$.

"Constant" always means not depending on something. Usually it's clear from the context what the "something" is, but I think it should be stated explicitly more often than it is in present-day conventional practice. Here's a favorite example of mine, involving differentiation of exponential functions: \begin{align} \frac{d}{dx} 2^x & = \lim_{h\to0}\frac{2^{x+h}-2^x}{h} \\[10pt] & = 2^x\lim_{h\to0}\frac{2^h-1}{h}\tag{1} \\[10pt] & = (2^x\cdot\text{constant})\tag{2}. \end{align}

In $(1)$, the factor $2^x$ can be taken out of the limit because it's "constant" but "constant" means not depending on $h$.

In $(2)$, "constant" means not depending on $x$.

Some instructors in calculus classes actually present this proof without mentioning the contextual change in the meaning of "constant".

Later note in response to comments below: The linked paper uses a rather obnoxious notation, $p(x)$ and $p(k)$ for the probability density functions of two different random variables. One should distinguish between capital $X$ and lower-case $x$ in expressions like $\Pr(X=x)$, where capital $X$ is a random variable and lower-case $x$ is a particular value that $X$ might be equal to. Then, if one writes $p_X(x)$ for the value of the probability density function of a random variable (capital) $X$ at the point (lower-case) $x$, then one knows that $p_X(3)$ means something different from $p_Y(3)$.

But at any rate, $p(x)p(k)$ is the notation used in the linked paper for the joint density function of a pair of independent random variables, and $p(y\mid x,k)$ is the conditional density of another random variable given the values of those two. The idea is that if you multiply those, what you get is proportional, as a function of $x$ and $k$ with $y$ fixed, to the conditional density function of the random variables corresponding to $x$ and $k$, given an observed value of the random variable corresponding to $y$.

Related Question