Bayesian Estimation – Estimating IID Samples from Uniform[0,?] with Pareto(?,?) Prior for ?

bayesianconjugate-priorestimationuniform distribution

I am working on Bayesian estimation: suppose that $X_1,\dots, X_n$ is an iid sample from Uniform$[0,\theta]$. Assume a Pareto prior for $\theta\sim Pareto(\alpha,\beta)$, i.e.
$$
f(\theta)=\frac{\alpha\beta^\alpha}{\theta^{\alpha+1}}, \, \theta\ge \beta, \alpha>0, \beta>0
$$

What is the Bayes estimator of $\theta$? Do the prior and posterior belong to the same family of distributions (i.e., conjugate prior)?

(2) What does this estimator converge to as $n\to \infty$?


My work is as follows.

The prior distribution:
$$
\pi(\theta)=\frac{\alpha\beta^\alpha}{\theta^{\alpha+1}}I[\theta\ge \beta]
$$

and the likelihood function is
$$
f(X|\theta)=\prod_{i=1}^n f(x_i;\theta)=\frac{1}{\theta^n}I[0\le X_{(1)}\le X_{(n)}\le \theta]
$$

where $X_{(1)}\le \dots X_{(n)}$.

Then the posterior distribution is
$$
\pi(\theta|X)\approx \pi(\theta)L(X|\theta)=\frac{\alpha\beta^\alpha}{\theta^{n+\alpha+1}}I[0\le X_{(1)}\le X_{(n)}\le \theta]I[\theta\ge \beta]
$$

So for $\beta\ge X_{(n)}$, $\pi(\theta|X)\approx \frac{\alpha\beta^\alpha}{\theta^{n+\alpha+1}}I[\theta\ge \beta]$.

For $\beta<X_{(n)}$, $\pi(\theta|X)\approx \frac{\alpha\beta^\alpha}{\theta^{n+\alpha+1}}I[\theta\ge X_{(n)}]$.

But I have no idea about the distribution of $\theta|X$? Does it mean $\theta|X\sim Pareto(n+\alpha,\beta)$?


For the Bayes estimator,
$$
\hat{\theta}=E[\theta|X]=\int_R \theta \pi(\theta|x)dx
$$

As $\beta\ge X_{(n)}$,
$$
\hat{\theta}=E[\theta|X]=\int_\beta^\infty \theta \frac{(n+\alpha)\beta^{n+\alpha}}{\theta^{n+\alpha+1}}d\theta=\frac{(n+\alpha)\beta}{n+\alpha-1}
$$

As $\beta< X_{(n)}$, the pdf of $\theta|X$ is
$$
g(\theta|x)=\frac{(n+\alpha)(X_{(n)})^{n+\alpha}}{\theta^{n+\alpha+1}}I[\theta\ge X_{(n)}],
$$

then
$$
\hat{\theta}=E[\theta|X]=\frac{(n+\alpha)X_{(n)}}{n+\alpha-1}
$$

It seems that this result is not right because the Bayes estimator should be weighted of the prior mean ($E[\theta]=\frac{\alpha\beta}{\alpha-1}$) and sample mean $\bar{X}$.

Best Answer

The Pareto is indeed conjugate to the uniform, see e.g. Aside from the exponential family, where else can conjugate priors come from?.

The posterior mean looks right, see also https://en.wikipedia.org/wiki/Pareto_distribution (in Wikipedia's notation, $\alpha>1$ is guaranteed as the present $\alpha$ is positive and the sample size $n\geq1$).

The result that the posterior mean is a weighted average of prior mean and MLE (which the sample mean is not, though, so that I am not sure why to expect that in the first place?) is restricted to certain parametrizations in exponential families (and the uniform is not a member). See e.g. Can the posterior mean always be expressed as a weighted sum of the maximum likelihood estimate and the prior mean? or How does Prior Variance Affect Discrepancy between MLE and Posterior Expectation.

We have that the maximum $X_{(n)}$ is consistent for $\theta$. This follows from, e.g., https://math.stackexchange.com/questions/2905482/expectation-and-variance-of-y-maxx-1-ldots-x-n-where-x-is-uniformly-dis (slightly adapting the argument from a uniform on $[0,1]$ to one on $[0,\theta]$; essentially, work with cdf $y/\theta$ on $[0,\theta]$ instead of cdf $y$ on $[0,1]$) and noting that $E(X_{(n)})\to\theta$ and $V(X_{(n)})\to0$, so mean square convergence which implies consistency.

Also, $(n+\alpha)/(n+\alpha−1)\to1$ the posterior mean will tend to either the true $\theta$ or, when $\beta\geq X_{(n)}$, to $\beta$. [One could additionally consider the variance of the Pareto posterior, which is $\mathcal{O}(n^{-2})$.]

Asymptotically, the latter only seems possible when $\beta$ is larger than the true $\theta$ in view of consistency of $X_{(n)}$ for $\theta$. In that case, the support of the prior does not include the true parameter so that the posterior mean cannot concentrate on the true value.

Here is a little plot with posteriors for different $n$ and one prior choice for $\beta$ smaller (solid) and one larger than the true upper bound of the uniform (vertical black bar). We notice how the posterior concentrates around either the sample maximum or $\beta$.

enter image description here

library(EnvStats)

theta <- 1
beta.low <- 0.8
beta.high <- 1.03
alpha <- 0.5

n <- c(10, 20, 30, 50)
x <- runif(n[4], 0, theta)

x.n <- sapply(n, function(i) max(x[1:i]))

alpha.n <- alpha + n
beta.nlow <- max(x.n, beta.low)
beta.nhigh <- max(x.n, beta.high)

theta.ax <- seq(0.95, 1.1, by=.0001)
plot(theta.ax, dpareto(theta.ax, beta.nlow, alpha.n[4]), type="l", lwd=2, col="deeppink4")
lines(theta.ax, dpareto(theta.ax, beta.nlow, alpha.n[3]), type="l", lwd=2, col="lightblue")
lines(theta.ax, dpareto(theta.ax, beta.nlow, alpha.n[2]), type="l", lwd=2, col="orange")
lines(theta.ax, dpareto(theta.ax, beta.nlow, alpha.n[1]), type="l", lwd=2, col="chartreuse")

lines(theta.ax, dpareto(theta.ax, beta.nhigh, alpha.n[4]), type="l", lwd=2, col="deeppink4", lty=2)
lines(theta.ax, dpareto(theta.ax, beta.nhigh, alpha.n[3]), type="l", lwd=2, col="lightblue", lty=2)
lines(theta.ax, dpareto(theta.ax, beta.nhigh, alpha.n[2]), type="l", lwd=2, col="orange", lty=2)
lines(theta.ax, dpareto(theta.ax, beta.nhigh, alpha.n[1]), type="l", lwd=2, col="chartreuse", lty=2)

abline(v=theta, lwd=4)

Quibbles: The posterior mean is only "the" Bayes estimator when you working with the squared error loss function.

Also, you could omit the lower indicator in the likelihood function since you know that all $X_i$ are nonnegative.

To indicate that the posterior is proportional to some kernel of a distribution, it is more common to use $\propto$ than $\approx$.

Related Question