Statistical estimation: how is the following probability distribution defined

optimizationprobability distributionsprobability theorystatistical-inferencestatistics

I am currently reading Chapter 7 of Stephen Boyd's textbook. From Stephen Boyd's optimization textbook, it says

enter image description here

This is already very confusing;

From my understanding,

  • If $X$ is a random variable on a probability space $(\Omega, \mathcal{A}, \mu)$, then $X$ induces a probability measure
    on $\mathbb{R}$ called the probability distribution, $\mu(A) = \Pr(X
    \in A)$
    , $A \in \mathcal{A}$.

  • A distribution function of $X$ is the function $F(x) = \Pr(X\leq x)$,
    also known as the CDF.

  • The density of $F$ is the function $f$ such that $F(x) =
    \int\limits_{-\infty}^x f(y) dy$

So what does probability distribution "indexed" by a vector mean in this context? Also I am troubled by the notation $p_x(\cdot)$. In my experience, almost all densities are written as $f_X(x)$, where $X$ is the random variable. But here $x$ is just a vector. In the entire chapter, the notion of a random variable is never even brought up, yet curiously a probability distribution etc is still able to be defined.

Can someone help me understand the meaning of the very first sentence and offer an example of what a concrete probability distribution and density $p_x(\cdot)$ in this context?

Best Answer

Consider the following example. Let $Y\sim\exp(\lambda)$, where $\lambda\in \mathbb{R}_{>0}$. The value of parameter $\lambda$ is unknown. So one considers the family of probability distributions on $\mathbb{R}$ indexed by $\lambda$ s.t. $p_{\lambda}(y)=\lambda\exp(-\lambda y)$, $y\ge 0$. Letting $P_{\lambda}$ denote the distribution corresponding to $p_{\lambda}$, i.e. $P_{\lambda}(A)=\int_A p_{\lambda}(y)\,dy$, this family can be written as $\{P_{\lambda}:\lambda\in(0,\infty)\}$. The log-likelihood in this case is given by $$ l(\lambda)=\ln p_{\lambda}(y)=\ln\lambda-\lambda y. $$

If we are given $n$ i.i.d. copies of $Y$, i.e. $(Y_1,\ldots,Y_n)$, then $p_{\lambda}(y)=\prod_{i=1}^n \lambda \exp(-\lambda y_i)$, where $y\equiv(y_1,\ldots,y_n)\ge 0$. The log-likelihood in this case becomes $$ l(\lambda)=n\ln\lambda-\lambda\sum_{i=1}^n y_i. $$