[Math] Likelihood Function for the Uniform Density.

maximum likelihoodstatistical-inferencestatistics

Let the random variable $X$ have a uniform density given by
$$
f(x;\theta)=I_{[\theta-\frac{1}{2},\theta+\frac{1}{2}]}
$$

where $-\infty\leq\theta\leq\infty $

the likelihood function for a sample of size $n$ is

$$L(\theta;x_1,\ldots,x_n)=\prod_{i=1}^nf(x_i;\theta)=f(x_1;\theta)\ldots f(x_n;\theta)$$

Why the likelihood function for a sample of size $n$ is the "joint density" of the $n$ random variables?

  • I am asking the question because I knew that we have to select $\hat \theta$ in such a way that the random variables $X_1,\ldots,X_n$ assume a particular value $x\prime_1,\ldots,x\prime_n$ such that $f_{X_1,\ldots,X_n}(x_1,\ldots,x_n;\theta) $ is a maximum.

But here, for the Uniform density function, there is no hint that the value $x_1$ of the random variable $X_1$ maximizes the function $f_{X_1}(x;\theta),\ldots,$ the value $x_n$ of the random variable $X_n$ maximizes the function $f_{X_n}(x;\theta).$ Then how $x_1,\ldots,x_n$ maximizes $f_{X_1,\ldots,X_n}(x_1,\ldots,x_n;\theta) $ for the given density function , ie, for the Uniform density function?

Again,

$$L(\theta;x_1,\ldots,x_n)=\prod_{i=1}^nf(x_i;\theta)=\prod_{i=1}^n I_{[\theta-\frac{1}{2},\theta+\frac{1}{2}]} (x_i)=I_{[y_n-\frac{1}{2},y_1+\frac{1}{2}]} (\theta)$$

where $Y_1$ is the smallest of the observations & $Y_n$ is the largest.

  • Why have we changed the range? Is it so as Likelihood-function is a function of $\theta$ and here $x_1,\ldots,x_n$ fixed?

  • Again i have not understand the point: if we subtract $\frac{1}{2}$ from the largest value $y_n$ then it will be the lower limit of $\theta$ and if we add $\frac{1}{2}$ to the smallest value $y_1$ then it will be the upper limit of $\theta$.
    Can you please give me a numerical example of the situation?

Best Answer

1) If n rv's (not one) are independent, then their joint density is the product of their individual densities (this extends the notion of the probability of intersection of n independent events in the case of rv's). If, moreover, they are (assumed) identically distributed, then $$ L(X_1,\ldots,X_n| \theta)=\prod_{i=1}^nf(X_i|\theta)=f(X_1|\theta)\ldots f(X_n|\theta)$$ is the joint density. If we view this expression as a function of $\theta$, then it becomes

$$ L(\theta|x_1,\ldots,x_n)=\prod_{i=1}^nf(\theta|x_i)=f(\theta|x_1)\ldots f(\theta|x_n)$$ and now it is the likelihood function of the sample. Is there a difference? Big one: In the joint density, the argument/variable of the function is the X's while the $\theta$'s are treated as constants. In the likelihood function, the arguments/variables are the $\theta$'s while the x's are treated as constants (changing from uppercase to lowercase for the x's is a usual -and good- mnemonic) So the likelihood function is not the joint density - they may have the same functional form, but they are functions of different variables.

2) In view of 1), we do not select the $\theta$'s "so as the rv's assume a particular value...etc" as you write. We select the $\theta$'s given the specific values that the r.v's have already taken (values that constitute our sample), so as the resulting value of the function is maximum. Then we flip our point of view again and say (very loosely speaking), that this value of $\theta$ "maximizes the probability that the sample comes from the assumed distribution".

3) The uniform distribution is special in that the variable itself does not appear in the density function. The specific case we study is even more special, because not even the limits of the interval into which the rv ranges appear in it: the density is equal to unity for the specified range, and zero elsewhere (even though the range of the rv is not [0,1]). Instead of writing a piece-wise density we write compactly $$ f_U (X_i)=I_{[\theta-\frac{1}{2},\;\theta+\frac{1}{2}]} (X_i)$$ Then the joint density for the case of n i.i.d rv's is $$ L(X_1,\ldots,X_n| \theta)=\prod_{i=1}^nf(X_i|\theta)=\prod_{i=1}^n I_{[\theta-\frac{1}{2},\;\theta+\frac{1}{2}]} (X_i) =\min\{I_{[\theta-\frac{1}{2}\le X_1\le\theta+\frac{1}{2}]} ,...,I_{[\theta-\frac{1}{2}\le X_n\le\theta+\frac{1}{2}]}\}$$ where the last equality comes from basic properties of indicator functions. Consider now the joint density as a likelihood function of $\theta$, with given x's. Then we should maximize this likelihood function? No. We are applying here maximum likelihood estimation, and the ML estimators are M-estimators, which maximize the expected value of the likelihood function (or its sample analogue, whatever is feasible). So our estimation problem here can be stated as $$ \max_{\theta}E\min\{I_{[\theta-\frac{1}{2}\le x_1\le\theta+\frac{1}{2}]} ,...,I_{[\theta-\frac{1}{2}\le x_n\le\theta+\frac{1}{2}]}\}$$

Let's start the optimization procedure. Note that if we choose some $\hat \theta$ such that some of the realized values in the sample fall outside the specified range, the min{} expression will be equal to zero. But if the chosen $\hat \theta$ has a value that leaves all realized values of the rv's inside the resulting common interval, then the min{} expression will be equal to unity. I adopt the notation $x_1=x_{min}$ and $x_n=x_{max}$. So a first conclusion is that we must have $$ x_1\ge \theta-\frac 12 \text{and} \ x_n\le \theta+\frac 12 \\ \Rightarrow x_n-\frac 12\le \theta\le x_1+\frac 12$$ (Parenthesis: For $\hat\theta$ to be able to obtain a value in the reals this interval must exist. So we must have $$x_n-\frac 12\le x_1+\frac 12 \Rightarrow x_n-x_1 \le 1$$ In other words, if in the sample at hand the difference between the maximum realized value and the minimum realized value is greater than unity, then one of the initial assumptions is wrong: either the n rv's do not have the postulated distribution, or they are not identically distributed). Assume that $x_n-x_1 \le 1$ holds. Then our maximization problem has been transformed (after this first step of optimization) as $$ \max_{\theta}EI_{[x_n-\frac 12\ \, \le \, \theta \, \le \, x_1+\frac 12]}$$

The $\min$ operator has disappeared because all indicator functions have become identical. Now by the properties of indicator functions, $$ EI_{[x_n-\frac 12\ \, \le \, \theta \, \le \, x_1+\frac 12]} = P\left(\left[x_n-\frac 12\ \, \le \, \theta \, \le \, x_1+\frac 12\right]\right)$$ where P() is the probability of the interval. Next we can write the upper bound for $\theta$ as a constraint and solve the constrained maximization problem

$$ \max_{\theta}P\left(\left[x_n-\frac 12\ \, , \, \theta\right]\right) \qquad \text{s.t.}\ \theta\le x_1+\frac 12$$

Obviously this probability increases with the length of the interval, and so $$ \hat \theta : [x_n-\frac 12\ , \hat \theta] = \text{maximum}$$ which obviously results in $$ \hat \theta_{\text{ML}} = x_1+\frac 12$$

Then the maximum likelihood estimate of the common distribution of the n rv's is $$U\left(x_1\, , \, x_1+1\right).$$

Related Question