Solved – Simple example that shows the advantages of Bayesian Model Averaging (BMA)

bayesiandata visualization

I'm incorporating a Bayesian Model Averaging (BMA) approach in my research and will soon give a presentation about my work to my colleagues. However, BMA isn't really that well-known in my field, so after presenting them with all the theory and before actually applying it to my problem, I want to present a simple, yet instructive example on why BMA works.

I was thinking about a simple example with two models one can choose from, but the true data generating model (DGM) is somewhere in between and the evidence doesn't really favor any one of them. So if you choose one and continue from them on, you would ignore model uncertainty and make an error, but BMA, although the true model is not part of the model set, at least gives correct posterior density of the parameter of interest. For instance, there are two weather forecasts each day (A and B) and one wants to predict the weather best, so in classical statistics you would first try to find the best forecaster between the two, but what if the truth is somewhere in between (that is, sometimes A is right, sometimes B). But I couldn't formalize it. Something like that but I'm very open to ideas. I hope this question is specific enough!

In the literature, I haven't found any nice examples from what I have read so far:

Kruschke (2011), while a great introduction to Bayesian statistics, doesn't really focus on BMA and the coin toss example he has in chapter 4 is great for introducing Bayesian statistics, but doesn't really convince a fellow researcher to use BMA. ("Why again do I have three models, one saying the coin is fair and two saying it's biased in either direction?")
All the other stuff I read (Koop 2003, Koop/Poirier/Tobias (2007), Hoeting et al. (1999) and tons of others) are great references, but I haven't found a simple toy example in them.

But maybe I just missed a good source here.

So does anyone have a good example he or she uses to introduce BMA? Maybe by even showing the likelihoods and posteriors because I think that would be quite instructive.

Best Answer

I did something similar recently. Not so much trying to convince others, but doing a small project which allowed me to get a little taste of BMA. What I did was to generate a dataset with a binary response, three independent variables which had an effect on the response and seven variables which did not have any effect on the response. I then compared the BMA results to the frequentist estimates in logistic regression. I think that at least in this case the BMA approach appears to be quite good. If you want to make it more accessible you can always name the variables or something instead of calling them the generic $X$ and $y$.

The R code I used for this is presented below. Hope it can inspire you!

# The sample size
n <- 100

# The 'true' coefficient vector
Beta <- cbind(c(-1.5, 0.45, -3))

# Generate the explanatory variables which have an effect on the outcome
set.seed(1)
X <- cbind(rnorm(n, 0, 1), rnorm(n, 4, 2), rnorm(n, 0.5, 1))

# Convert this into probabilities
prob <- 1/(1+exp(-X %*% Beta))

# Generate some uniform numbers. If the elements are smaller than the corresponding elements in the prob vector, then return 1.
set.seed(2)
runis <- runif(n, 0, 1)
y <- ifelse(runis < prob, 1, 0)

# Add the nonsense variables
X <- cbind(X, rpois(n, 3))        # Redundant variable 1 (x4)
X <- cbind(X, rexp(n, 10))        # Redundant variable 2 (x5)
X <- cbind(X, rbeta(n, 3, 10))    # Redundant variable 3 (x6)
X <- cbind(X, rbinom(n, 10, 0.5)) # Redundant variable 4 (x7)
X <- cbind(X, rpois(n, 40))       # Redundant variable 5 (x8)
X <- cbind(X, rgamma(n, 10, 20))  # Redundant variable 6 (x9)
X <- cbind(X, runif(n, 0, 1))     # Redundant variable 7 (x10)


# The BMA
library(BMA)
model <- bic.glm(X, y,  glm.family="binomial", factor.type=FALSE, thresProbne0 = 5, strict = FALSE)

# The frequentist model
model2 <- glm(y~X, family = "binomial")

old.par <- par()
par(mar=c(3,2,3,1.5))
plot(model, mfrow=c(2,5))
par(old.par)

summary(model)
summary(model2)

Related Solutions

Solved – Examples of Bayesian and frequentist approach giving different answers

This example is taken from here. (I even think I got this link from SO, but cannot find it anymore.)

A coin has been tossed $n=14$ times, coming up heads $k=10$ times. If it is to be tossed twice more, would you bet on two heads? Assume you do not get to see the result of the first toss before the second toss (and also independently conditional on $\theta$), so that you cannot update your opinion on $\theta$ in between the two throws.

By independence, $$f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|\theta)=f(y_{f,1}=\text{heads})f(y_{f,2}=\text{heads}|\theta)=\theta^2.$$ Then, the predictive distribution given a $\text{Beta}(\alpha_0,\beta_0)$-prior, becomes \begin{eqnarray*} f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|y)&=&\int f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|\theta)\pi(\theta|y)d\theta\notag\\ &=&\frac{\Gamma\left(\alpha _{0}+\beta_{0}+n\right)}{\Gamma\left(\alpha_{0}+k\right)\Gamma\left(\beta_{0}+n-k\right)}\int \theta^2\theta ^{\alpha _{0}+k-1}\left( 1-\theta \right) ^{\beta _{0}+n-k-1}d\theta\notag\\ &=&\frac{\Gamma\left(\alpha_{0}+\beta_{0}+n\right)}{\Gamma\left(\alpha_{0}+k\right)\Gamma\left(\beta_{0}+n-k\right)}\frac{\Gamma\left(\alpha_{0}+k+2\right)\Gamma\left(\beta_{0}+n-k\right)}{\Gamma\left(\alpha_{0}+\beta_{0}+n+2\right)}\notag\\ &=&\frac{(\alpha_{0}+k)\cdot(\alpha_{0}+k+1)}{(\alpha_{0}+\beta_{0}+n)\cdot(\alpha_{0}+\beta_{0}+n+1)} \end{eqnarray*} For a uniform prior (a $\text{Beta}(1, 1)$-prior), this gives roughly .485. Hence, you would likely not bet. Based on the MLE 10/14, you would calculate a probability of two heads of $(10/14)^2\approx.51$, such that betting would make sense.

Solved – Simple model selection example in PYMC

When defining a potential, the function should return the log-probability. So instead of

return np.exp(loglikel)

You want

return loglikel

Another thing to watch out for is the expression 1/s0**2. Make sure that s0 is float in this expression, or change it to 1.0/s0**2.

PyMC also has a bug in its choice of proposals for binary variables (documented in section 5.7.3), which I first noticed in another answer. The fix is to add the line:

mc.use_step_method(DiscreteMetropolis, V, proposal_distribution='Prior')

Best Answer

Related Solutions

Solved – Examples of Bayesian and frequentist approach giving different answers

Solved – Simple model selection example in PYMC

Related Question