Statistical estimation: how is the following probability distribution defined

optimizationprobability distributionsprobability theorystatistical-inferencestatistics

I am currently reading Chapter 7 of Stephen Boyd's textbook. From Stephen Boyd's optimization textbook, it says

This is already very confusing;

From my understanding,

If $X$ is a random variable on a probability space $(\Omega, \mathcal{A}, \mu)$, then $X$ induces a probability measure
on $\mathbb{R}$ called the probability distribution, $\mu(A) = \Pr(X
\in A)$, $A \in \mathcal{A}$.
A distribution function of $X$ is the function $F(x) = \Pr(X\leq x)$,
also known as the CDF.
The density of $F$ is the function $f$ such that $F(x) =
\int\limits_{-\infty}^x f(y) dy$

So what does probability distribution "indexed" by a vector mean in this context? Also I am troubled by the notation $p_x(\cdot)$. In my experience, almost all densities are written as $f_X(x)$, where $X$ is the random variable. But here $x$ is just a vector. In the entire chapter, the notion of a random variable is never even brought up, yet curiously a probability distribution etc is still able to be defined.

Can someone help me understand the meaning of the very first sentence and offer an example of what a concrete probability distribution and density $p_x(\cdot)$ in this context?

Best Answer

Consider the following example. Let $Y\sim\exp(\lambda)$, where $\lambda\in \mathbb{R}_{>0}$. The value of parameter $\lambda$ is unknown. So one considers the family of probability distributions on $\mathbb{R}$ indexed by $\lambda$ s.t. $p_{\lambda}(y)=\lambda\exp(-\lambda y)$, $y\ge 0$. Letting $P_{\lambda}$ denote the distribution corresponding to $p_{\lambda}$, i.e. $P_{\lambda}(A)=\int_A p_{\lambda}(y)\,dy$, this family can be written as $\{P_{\lambda}:\lambda\in(0,\infty)\}$. The log-likelihood in this case is given by $$ l(\lambda)=\ln p_{\lambda}(y)=\ln\lambda-\lambda y. $$

If we are given $n$ i.i.d. copies of $Y$, i.e. $(Y_1,\ldots,Y_n)$, then $p_{\lambda}(y)=\prod_{i=1}^n \lambda \exp(-\lambda y_i)$, where $y\equiv(y_1,\ldots,y_n)\ge 0$. The log-likelihood in this case becomes $$ l(\lambda)=n\ln\lambda-\lambda\sum_{i=1}^n y_i. $$

Related Solutions

[Math] Is there some connection between Kernel density estimation and Empirical distribution function

Not precisely.

About histograms, KDEs and ECDFs.

(1) Roughly speaking, a histogram (on a density scale so that the sum of areas of bars is unity) can be viewed as a estimate of the density function. A KDE is a more sophisticated method of density estimation. Generally speaking one cannot reconstruct the exact values of the data for either a histogram or a KDE.

(2) By contrast an empirical CDF (ECDF) retains exact information about all of the data. An ECDF is made as follows: (a) sort the data from smallest to largest, (b) make a stair-step function that begins at 0 below the minimum and increases by $1/n$ at each data value, where $n$ is the sample size. If $k$ values are tied then the increase is $k/n$ at the tied value.

Thus the ECDF approximates the CDF of the distribution, with increasingly accurate approximations for samples of increasing size. Generally speaking an ECDF gives a better approximation to the population CDF than a histogram gives for the density function. (Information is lost in binning data to make a histogram.)

[By suitable manipulation (a kind of numerical integration), information in a KDE could be used to make a function that imitates the population CDF, but it does not use the actual data values. In my experience, this is rarely done.]

Graphical illustrations.

(1) A sample of size $n = 100$ from $$\mathsf{Gamma}(\text{shape} = \alpha = 5,\,\text{rate} = \lambda = 1/6)$$ is simulated. The figure shows a density histogram (blue bars), the default KDE from R statistical software (red curve), and the population density function (black).

set.seed(930)
x = rgamma(100, 5, 1/6)
summary(x)
hist(x, prob=T, ylim=c(0,.035), 
   col="skyblue2", main="n = 100")
 rug(x)  # tick marks below x-axis
 lines(density(x), lwd=2, lty="dotted", col="red")
 curve(dgamma(x, 5, 1/6), add=T)

(2) Sampling from the same distribution, we show the ECDF for a sample of size $n = 20,$ so that the steps are easy to see.

set.seed(2019)
x = rgamma(20, 5, 1/6)
plot(ecdf(x), main="n = 20", col="blue");  rug(x)
  curve(pgamma(x, 5, 1/6), add=T, lwd=2)

Does “probability distribution” refer to the PDF or CDF

A probability distribution is just about anything that defines the likelihood of certain outcomes from an experiment. That can be defined in different ways, including the probability density function (PDF) for continuous variables, or the probability mass function (PMF) for discrete variables, or the cumulative distribution function for either continuous or discrete variables. If it's scaled properly and can be used to determine the likelihood of any possible outcome, it is some kind of probability distribution.

In practice, I would typically read "probability distribution" as referring to the non-cumulative density or mass function, but it's not a precise term. One would be reasonably well understood if they refer to a "normal probability distribution", although the more precise term would be a "normal probability density function". Probability density functions often come in common shapes like normal, uniform, Weibull, binomial, and others which are easily referred to by name, and referring to those distributions typically calls to mind an image of the probability density function rather than the cumulative distribution function - when someone refers to the "normal probability distribution", I'd wager that most people will typically think of a Gaussian bell curve PDF and not the sigmoidal CDF. Either one are equally valid representations of the same probability distribution, though - the underlying distribution is not affected by how one chooses to represent it.

Best Answer

Related Solutions

[Math] Is there some connection between Kernel density estimation and Empirical distribution function

Does “probability distribution” refer to the PDF or CDF

Related Question