Solved – What would a robust Bayesian model for estimating the scale of a roughly normal distribution be

bayesianestimationrrobuststandard deviation

There exists a number of robust estimators of scale. A notable example is the median absolute deviation which relates to the standard deviation as $\sigma = \mathrm{MAD}\cdot1.4826$. In a Bayesian framework there exist a number of ways to robustly estimate the location of a roughly normal distribution (say a Normal contaminated by outliers), for example, one could assume the data is distributed as a t distribution or Laplace distribution. Now my question:

What would a Bayesian model for measuring the scale of a roughly normal distribution in a robust way be, robust in the same sense as the MAD or similar robust estimators?

As is the case with MAD, it would be neat if the Bayesian model could approach the SD of a normal distribution in the case when the distribution of the data actually is normally distributed.

edit 1:

A typical example of a model that is robust against contamination/outliers when assuming the data $y_i$ is roughly normal is using a t distribution like:

$$y_i \sim \mathrm{t}(m, s,\nu)$$

Where $m$ is the mean, $s$ is the scale, and $\nu$ is the degree-of-freedom. With suitable priors on $m, s$ and $\nu$, $m$ will be an estimate of the mean of $y_i$ that will be robust against outliers. However, $s$ will not be a consistent estimate of the SD of $y_i$ as $s$ depends on $\nu$. For example, if $\nu$ would be fixed to 4.0 and the model above would be fitted to a huge number of samples from a $\mathrm{Norm}(\mu=0,\sigma=1)$ distribution then $s$ would be around 0.82. What I'm looking for is a model which is robust, like the t model, but for the SD instead of (or in addition to) the mean.

edit 2:

Here follows a coded example in R and JAGS of how the t-model mentioned above is more robust with respect to the mean.

# generating some contaminated data
y <- c( rnorm(100, mean=10, sd=10), 
        rnorm(10, mean=100, sd= 100))

#### A "standard" normal model ####
model_string <- "model{
  for(i in 1:length(y)) {
    y[i] ~ dnorm(mu, inv_sigma2)
  }

  mu ~ dnorm(0, 0.00001)
  inv_sigma2 ~ dgamma(0.0001, 0.0001)
  sigma <- 1 / sqrt(inv_sigma2)
}"

model <- jags.model(textConnection(model_string), list(y = y))
mcmc_samples <- coda.samples(model, "mu", n.iter=10000)
summary(mcmc_samples)

### The quantiles of the posterior of mu
##  2.5%   25%   50%   75% 97.5% 
##   9.8  14.3  16.8  19.2  24.1 

#### A (more) robust t-model ####
library(rjags)
model_string <- "model{
  for(i in 1:length(y)) {
    y[i] ~ dt(mu, inv_s2, nu)
  }

  mu ~ dnorm(0, 0.00001)
  inv_s2 ~ dgamma(0.0001,0.0001)
  s <- 1 / sqrt(inv_s2)
  nu ~ dexp(1/30) 
}"

model <- jags.model(textConnection(model_string), list(y = y))
mcmc_samples <- coda.samples(model, "mu", n.iter=1000)
summary(mcmc_samples)

### The quantiles of the posterior of mu
## 2.5%   25%   50%   75% 97.5% 
##8.03  9.35  9.99 10.71 12.14 

Best Answer

Bayesian inference in a T noise model with an appropriate prior will give a robust estimate of location and scale. The precise conditions that the likelihood and prior need to satisfy are given in the paper Bayesian robustness modelling of location and scale parameters by Andrade and O'Hagan (2011). The estimates are robust in the sense that a single observation cannot make the estimates arbitrarily large, as demonstrated in figure 2 of the paper.

When the data is normally distributed, the SD of the fitted T distribution (for fixed $\nu$) does not match the SD of the generating distribution. But this is easy to fix. Let $\sigma$ be the standard deviation of the generating distribution and let $s$ be the standard deviation of the fitted T distribution. If the data is scaled by 2, then from the form of the likelihood we know that $s$ must scale by 2. This implies that $s = \sigma f(\nu)$ for some fixed function $f$. This function can be computed numerically by simulation from a standard normal. Here is the code to do this:

library(stats)
library(stats4)
y = rnorm(100000, mean=0,sd=1)
nu = 4
nLL = function(s) -sum(stats::dt(y/s,nu,log=TRUE)-log(s))
fit = mle(nLL, start=list(s=1), method="Brent", lower=0.5, upper=2)
# the variance of a standard T is nu/(nu-2)
print(coef(fit)*sqrt(nu/(nu-2)))

For example, at $\nu=4$ I get $f(\nu)=1.18$. The desired estimator is then $\hat{\sigma} = s/f(\nu)$.