Solved – How to compute the posterior predictive distribution of a logit model

bayesianlogitposteriorstan

So I used stan to take samples from a logit model. I want to compute the posterior predictive distribution of this model, but I am having trouble figureing out the logit link function and how it factors into my code.

As I normally would with a regular OLS model, I use rnorm to simulate draws from the posterior distribution, with the samples times the coefficients as the mean and the standard deviation as the posterior standard deviation of my intercepts (I use a hiearchical model).

Now, when do I concern myself with the link function? Whats odd is that, at this point, my PPD and my actual data on the binary variable is nearly identical. Here's my code.

Xb <- ppd %*% t(posterior_betas)
n <- nrow(joinerdf_just_hostile)
m <- nrow(posterior_betas)
y.rep_dur <- matrix(NA, nrow = n, ncol = m)
for (i in 1:m) {
  y.rep_dur[, i] <- rnorm(n, Xb, posterior_sigma)
}

sum(y.rep_dur>1)/sum(y.rep_dur<=1) #my ppd

sum(joinerdf_just_hostile$joiner==1)/(sum(joinerdf_just_hostile<1)) #my actual data

The result of my ppd is: 0.01296358, and the result of my actual data is:0.01273911. almost identical!

Is this just by chance or am I doing this right?

Best Answer

The posterior predictive distribution of a logit model involves four conceptual steps:

  1. Draw from the posterior distribution of the parameters
  2. Create linear predictors using those parameter draws
  3. Transform those linear predictors into probabilities via the inverse link function
  4. Draw from a Bernoulli (or Binomial with size = 1) distribution with those probabilities

It seems that you have done steps 1 and 2. In the logit case, rather than using rnorm(n, mean = Xb, sd = sigma), you would do rbinom(n, size = 1, prob = plogis(Xb)) and at that point you would be done. The last couple of lines in your code involving sum do not make sense (to me at least).

Related Question