Notation for conditional probability

conditional probabilitynotationprobability

Given $X = \{ x_1, x_2, \dots, \}$ and $Y = \{ y_1, y_2, \dots \}$ let $P(X,Y)$ be their joint probability. Conditioning $P(X, Y)$ on $y \in Y$ corresponds to looking at the distribution of the elements of $X$ when we disregard all elements of $Y$ other than $y$. This is achieved by normalizing $P(X,Y)$ by $P(Y = y) := \sum_{x \in X} P(X=x, Y=y_j)$. The resulting quantity is usually notated with $P(X \vert Y = y_j)$.

Now imagine that we only want to consider two elements $y_{j_1}$ and $y_{j_2}$ of $Y$. The joint distribution between $X$ and $Y$ that we obtain by disregarding all elements of $Y$ other than $y_{j_1}$ and $y_{j_2}$. I would like to notate this as follows
$$
P(X=x,Y=y_j \ \vert \ \{ y_{j_1}, y_{j_2} \})
:=
\begin{cases}
\frac{P(X=x,Y=y)}{P(Y=y_{j_1}) + P(Y=y_{j_2})} & \text{if } y_j = y_{j_1}, y_{j_2} \\
0, & \text{otherwise}
\end{cases}
$$

Question 1: Is there a standard way to notate this quantity?

Question 2: according to wikipedia "the conditional probability $P(X \mid Y)$ is a funciton of $Y$: e.g., if the function $g$ is defined as $g(y) = P(X \mid Y = y)$ then $P(X \mid Y) = g \circ Y$". If we introduce a variable $Z$ defined as
$$
Z = \{ z_1, z_2, \dots \} := \{ \{ y_{j_1}, y_{j_2} \}: y_{j_1}, y_{j_2} \in Y, y_{j_1} \ne y_{j_2} \}
$$

and denote $g(z) = P(X,Y \mid z_i)$, $z_i \in Z$, according to the definition of $P(X,Y \mid \{ y_{j_1}, y_{j_2} \})$ above, would it make sense to denote $P(X,Y | Z) = g \circ Z$ even though $Z$ is not properly a random variable?

Question 3: would the notation $P(X \vert \{y_1, y_2 \}) := \sum_{y = y_1, y_2}$ be acceptable? And its "extended" version $P(X \vert Z)$?

Example: in the example in the figure below, $X$ can take three values $\{x_1, x_2, x_3\}$ (three positions on an axis), while $Y$ can take three colors ($\{blue, yellow, green\}$). For each $X_i$ I show only the possible outcomes of $Y$ (i.e. the values $y_j \in Y$ for which $p(x_i,y_j) \ne 0$). If for an $x_i$ there are two possible outcomes of $Y$, I group them with parentheses. The values of the joint probability $P(X,Y)$ are indicated by the fractions. For simplicity, in this example, I assume that the probabilities of the possible outcomes are the same, Panel A. This means that $P(x_1, blue) = 1/4$, $P(x_2, blue) = 1/4$, $P(x_2, green) = 1/4$, and so on. This is $P(X,Y)$.

In panel B, I show the distribution once we ignore all outputs of $Y$ other than $blue$. This would be the $P(X \mid Y = blue)$.

In panel C I show the bivariate distribution of X and Y, once all but two colors, ($blue$ and $yellow$) are discarded. This would be $P(X,Y \mid \{ blue, yellow \})$.

Example

Best Answer

A standard notation for the right-hand side of your equation would be $\mathsf P(X=x\cap Y=y\mid Y\in\{y_1,y_2\})$. I’m not entirely convinced that this is really what you want, though; part of your post sounds as if what you really want is $\mathsf P(X=x\mid Y\in\{y_1,y_2\})$.

About the second question: $P(X\mid Y)$ does not denote all possible conditionings of $X$ on the elements of $Y$; it’s a random variable that’s a function of $Y$ (see Wikipedia). $P(X,Y\mid Z)$ can’t work like $P(X\mid Y)$ because whereas each outcome realizes a unique value of $Y$, each outcome realizes several pairs of values of $Y$, so you can’t have a function of “the” pair. You could of course have a function of the set of all pairs that are realized, but that would just be an unnecessarily fancy way of having a function of $Y$.

Related Question