Solved – Calculating prediction variance from JAGS model of a bernoulli outcome in R

bayesianbernoulli-distributionjagsr

I have a model of a bernoulli random process I fit using JAGS via the rjags package in R. Here are some example data, as well as code to fit the given models in JAGS via rjags:

#generate some binary 0/1 data
set.seed(666)
x1 = rnorm(1000)           # some continuous variables 
x2 = rnorm(1000)
z = 1 + 2*x1 + 3*x2        # linear combination with a bias
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = rbinom(1000,1,pr)      # bernoulli response variable

#fit a JAGS model. 
require(rjags)

jags.model = "
model {
for (i in 1:N){
y[i] ~ dbern(p[i])
p[i] <- 1 / (1 + exp(-z[i]))
z[i] <- a + b*x1[i] + c*x2[i]
}
a ~ dnorm(0, .0001)
b ~ dnorm(0, .0001)
c ~ dnorm(0, .0001)
}
"
#setup data as list
data = list(y=y, x1=x1, x2=x2, N = length(y))

#run JAGS model
j.model <- jags.model(file = textConnection(jags.model),
                      data=data,
                      n.chains=3)

#sample from the posterior
jags.out   <- coda.samples (model = j.model,
                            variable.names = c('a','b','c'),
                            n.iter = 1000)

To prove the model fits, here is some code to generate predicted vs. observed of the z variable (probability of being 0 or 1).

#grab coefficients, compare fitted vs observed of z to prove this fits. 
cf <- summary(jags.out)$statistics[,1]
fitted <- cf[1] + cf[2]*x1 + cf[3]*x2
plot(z ~ fitted, pch=16, cex = 0.2, ylab='observed',xlab='fitted')

Now that the model fits, What I would like to do is plot the relationship between one of the predictor variables, say x1, and the probabilty of the 0/1 outcome (the prediction for z), as well as +/- the uncertainty on this prediction. Getting the mean prediction across the range of x1 values is fairly straightforward using the model coefficients:

#okay, not plot the relationship between x1 and z, holding x2 constant
x1.range <- seq(min(x1),max(x1), by = (max(x1)-min(x1))/100)
z.fit <- cf[1] + x1.range*cf[2] + mean(x2) * cf[3]

#plot relationship
plot(z.fit~ x1.range, cex=0)
lines(smooth.spline(z.fit~x1.range), lwd=2, col='purple')

enter image description here

My question: How do I estimate the uncertainty associated with this relationship? An estimate of the standard deviation or the variance would be fine. Anything that would allow me to shade a confidence interval on this figure, given the model I have fit.

Best Answer

I think the best way to generate uncertainty values around each prediction is to send the desired predictions through the original JAGS model. This way, the model will sample from the distributions of all parameters, regardless of whether they are positive or negative, and generate a distribution around each prediction. This distribution will reflect the uncertainty associated with all parameters, at the value of each predictor of interest. Aspects of this distribution can then be used for plotting. To implement this in R, first generate a mean of x2, and a range of x1 values for which you want to make predictions for. Here I am going to predict over the entire range of x1 values in the data, divided into 101 intervals across the range of x1 values.

x1.range <- seq(min(x1),max(x1), by = (max(x1)-min(x1))/100)
x2.m     <- mean(x2)

I then modify the JAGS model to make predictions.

jags.model = "
model {
#model code
for (i in 1:N){
y[i] ~ dbern(p[i])
p[i] <- 1 / (1 + exp(-z[i]))
z[i] <- a + b*x1[i] + c*x2[i]
}

#predictions over x1, holding x2 constant at its mean value
for (i in 1:N.sim){
  z.x1[i] <- a + b*x1.range[i] + c*x2.m
}

a ~ dnorm(0, .0001)
b ~ dnorm(0, .0001)
c ~ dnorm(0, .0001)
}
"

Make sure to add the new data to your data list:

#setup data as list
data = list(y=y, x1=x1, x2=x2,
            x1.range = x1.range, x2.m = x2.m,
            N = length(y), N.sim = length(x1.range))

Compile the model, and the use the coda function to sample from the prediction posterior:

#compile the model
require(rjags)
j.model <- jags.model(file = textConnection(jags.model),
                      data=data,
                      n.chains=3)

#sample from the prediction posterior
x1.jags.out <- coda.samples(model=j.model, variable.names = c('z.x1'),n.iter=1000)

This will generate a mean and standard deviation for each predicted value of y, at each condition of x1 and x2 simulated, taking the full distribution and uncertainty of all parameter values into account. Finally, for plotting, grab the estimated mean and standard deviation (or whatever your favorite metric of variance is), and plot:

#generate plot, plotting mean and +/- 1 standard deviation. 
plot(x1.pred ~ x1.range, cex=0)
lines(smooth.spline(x1.pred~ x1.range), lwd=2, col='purple')
lines(smooth.spline((x1.pred+x1.pred.sd) ~ x1.range), lty=2, col='purple')
lines(smooth.spline((x1.pred-x1.pred.sd) ~ x1.range), lty=2, col='purple')

#shade error region
polygon(c(x1.range,rev(x1.range))
        ,c(x1.pred+x1.pred.sd,rev(x1.pred - x1.pred.sd))
        ,col=adjustcolor('purple',0.3)
        ,lty=0
)

enter image description here If specific probabilities are desired, then these z values need to be inverse logit transformed.

Related Question