Bayesian Inference – Are Bias and Variance Used as Metrics to Evaluate Estimators?

bayesianestimatorsfrequentist

Consider the parameter $\theta$, which is a deterministic unknown in the frequentist paradigm. Given a random variable $X \sim p_X(x ; \theta)$, consider the estimator $\Theta(X)$ of $\theta$, and the error defined as $$e = \Theta(X) – \theta$$ The bias of $\Theta$ is defined as $$\text{Bias}(\Theta) = E[e] = E[\Theta(X)] – \theta$$ and the variance of $\Theta$ is defined as
\begin{align}
\text{Var}(\Theta) &= \text{Var}(\Theta(X) – \theta) \\
&= \text{Var}(e) \\
&= E[e^2] – E[e]^2 \\
&= \text{MSE}(\Theta) – \text{Bias}(\Theta)^2
\end{align}
where $\text{MSE}(\Theta) = E[e^2]$ is the mean squared-error associated with $\Theta$. We often choose $\Theta$ that minimizes the mean squared-error, which in turn minimizes the bias and variance associated with $\Theta$.

Now consider the Bayesian paradigm, where $\theta$ is a random parameter with prior $p_\theta$. Given an observation $X \sim p_{X \mid \Theta}(x \mid \theta)$, we estimate $\theta$ as $\Theta(X)$. Are there similar definitions of bias and variance in the Bayesian paradigm? I'm guessing that the answer is no, as the definitions above would not be helpful in evaluating $\Theta$. More precisely, let the error be defined as $$e = \Theta(X) – \theta$$ In this case, the randomness of $e$ is due to both $\Theta(X)$ and $\theta$, and so the moments of $e$, such as its mean (bias in the frequentist case) and variance (variance of $\Theta$ in the frequentist case) would not be useful to us.

I’m aware that loss functions are preferred to evaluate Bayesian estimators. However, I’m not sure why these would be preferred over a similar definition of bias and variance above.

Best Answer

$\theta$ may be random to you (or me) as we do not know it and hence we put a prior on it.

That does not preclude that there is some true value $\theta_0$ out there. (To use the perennially poor but well-known statistician's example: consider a coin that is or is not fair due to its physical properties, but for which the owner of the coin does not know if it is or is not.)

If, then, some estimator is used over repeated samples, there is nothing wrong with computing its expected value (and similar comments apply to its variance) over repeated samples and see if that expectation coincides with the true value or not. Spoiler: mostly, Bayesian estimators are not unbiased.

For instance, the expected value of the posterior mean for binomial data with true success probability $\theta_0$ and beta priors with hyperparameters $(\alpha_0,\beta_0)$ can be written as $$ E_{Y|\theta_0}\left[E(\theta|y)\right]=w\frac{\alpha _{0}}{\alpha_{0}+\beta_0}+E_{Y|\theta_0}\left[(1-w)\frac{k}{n}\right]=w\frac{\alpha _{0}}{\alpha_{0}+\beta_0}+(1-w)\theta_0\neq\theta_0 $$ with $k$ the number of successes and $$ w=\frac{\alpha _{0}+\beta_0}{\alpha_{0}+\beta_0+n} $$

It is argued by Bayesians that we may not care since we want a good rule for the sample at hand (say, the above posterior mean).

Best Answer

Related Solutions

Solved – How does an estimator that minimizes a weighted sum of squared bias and variance fit into decision theory

Bayesian Inference – Difference Between Risk Function in Bayesian Inference and Supervised Learning

Related Question