Solved – Credible interval for Bayesian posterior of variance and mean, and posterior predictive of normal

bayesiancredible-intervalnormal distribution

I am trying to do a "round trip" to verify that my code works with regards to recovering parameters of normal distributions.

I'm using this document regarding the normal-gamma distribution:
http://www.cs.ubc.ca/~murphyk/Teaching/CS340-Fall07/reading/NG.pdf

The basic flow is to generate 100,000 mean and variance parameters from the normal-gamma with prior parameters given. Then for each mean-variance parameter pair, randomly generate 100 samples. Next, using the prior parameters and the data, develop 95% credible intervals for each of the 100,000 mean-variance pairs. And finally create 95% credible intervals for the posterior predictive distribution and compare these to a randomly generated observation.

What's interesting is that the mean and variance CIs never seem to contain the actual parameter 95% of the time, but the posteriror predictive interval seems fine. Is this expected? Here is the R code:

rm(list = ls(all = TRUE))

set.seed(115)
no_of_examples = 100000 
examples = seq(1:no_of_examples)

#priors
m_0 = 0
k_0 = 6
a_0 = 10
b_0 = 840

means = rnorm(no_of_examples, m_0, sqrt(k_0))
vars = 1/rgamma(no_of_examples, a_0, b_0)

samples = 100 
data = matrix(NA, no_of_examples, samples)

for (example in examples){
    data[example, ] = rnorm(samples,means[example], sqrt(vars[example]))
}       

# now recover parameters

m_n = rep(NA, no_of_examples)
k_n = rep(NA, no_of_examples)
a_n = rep(NA, no_of_examples)
b_n = rep(NA, no_of_examples)

for (example in examples){
    n = samples
    x_bar = mean(data[example,])
    sample_mean = rep(mean(data[example,]),samples)

    m_n[example] = (k_0 * m_0 + n * x_bar)/(k_0 + n)
    k_n[example] = (k_0 + n)
    a_n[example] = a_0 + n/2
    b_n[example] = b_0 + 1/2*(sum((data[example,]-sample_mean)^2))+(k_0 * n * (sample_mean[1] - m_0)^2)/(2 * (k_0 + n))
}

# check posterior of means    
conf = matrix(NA,no_of_examples,2)
in_range = rep(NA,no_of_examples)

for (example in examples){
    conf[example,] = qt(c(0.025,0.975),2*a_n[example]) * sqrt(b_n[example]/(a_n[example]*k_n[example])) + m_n[example]
}

in_range = ifelse(means > conf[,1],ifelse(means < conf[,2],1,0), 0)

print("% of means within 95% CI:")
print(sum(in_range)/length(in_range))

# check posterior of variances
ci_v = matrix(NA, no_of_examples, 2)

for (example in examples){
    ci_v[example,] = (1/qgamma(c(0.975,0.025), a_n[example], b_n[example])) 
}

in_range_v = ifelse(vars > ci_v[,1],ifelse(vars < ci_v[,2],1,0), 0)

print("% of variance within 95% CI:")
print(sum(in_range_v)/length(in_range_v))


# check posterior predictive distribution

y_p = rep(NA, no_of_examples)
ci_p = matrix(NA, no_of_examples, 2)

for(example in examples){
    # generate an observation
    y_p[example] = rnorm(1, means[example], sqrt(vars[example]))
    # find 95% CI for posterior predictive
    ci_p[example,] = qt(c(0.025,0.975),2*a_n[example]) * sqrt((b_n[example]*(k_n[example] + 1))/(a_n[example]*k_n[example])) + m_n[example]
}

in_range_p = ifelse(y_p > ci_p[,1],ifelse(y_p < ci_p[,2],1,0), 0)
print("% of observations within posterior predictive CI:")
print(sum(in_range_p)/length(in_range_p))

My results are:

[1] "% of means within 95% CI:"
[1] 0.94122
[1] "% of variance within 95% CI:"
[1] 0.93993
[1] "% of observations within posterior predictive CI:"
[1] 0.95123

Is there an error in my code? Or am I completely misunderstanding credible intervals?

Update: I edited a small error in the b_n update, and got a little bit closer on the variance CI, but it still seems to be consistently below 95% (when I change the seed). The percentage of means and variances falling within the 95% credible interval both seem to be right around 94%.

Best Answer

Fixed it. Edited code now works properly, and my new results are:

[1] "% of means within 95% CI:"
[1] 0.95325
[1] "% of variance within 95% CI:"
[1] 0.94997
[1] "% of observations within posterior predictive CI:"
[1] 0.94891

Hopefully this code will help someone else.

Related Question