Solved – Choosing informative priors for Bayesian ordered logistic regression

bayesianordered-logitordinal-dataprior

What are some guidelines for choosing weakly informative priors in a Bayesian ordinal regression? Consider the following model from the Stan manual (version 2.17.0, section 9.8, page 138):

data {
  int<lower=2> K;
  int<lower=0> N;
  int<lower=1> D;
  int<lower=1,upper=K> y[N];
  row_vector[D] x[N];
}
parameters {
  vector[D] beta;
  ordered[K-1] c;
}
model {
  for (n in 1:N)
    y[n] ~ ordered_logistic(x[n] * beta, c);
}

We specify the likelihood as ordered_logistic, but improper flat priors are left on all beta and c.

I can reason how to specify more informative priors on beta, because it is simply how much how much the log odds change for each unit increase in each predictor in x. These, for instance, could simply be beta ~ normal(0, 3) or something, depending on how the predictors are scaled.

However, how does one specify priors on the cutoff points c? They have to be ordered, but I am not sure how to specify that. Also, I am not sure how to think about them being distributed. Anybody know of guides for informative priors for ordinal regression?

The Stan community very briefly touches on it on their GitHub, but it isn't a fully-realized or explained section.

Best Answer

The use of an informative prior implies, of course, that you have information from which to be informed. For a similar problem, where I had binary outcome data, I happened to have twenty years of success/failure data leading up to the time period of my predictor variables. As it appeared to be relatively stable, I used the center of location of that two decades of data, but radically increased its variance.

My failure rate, up to when predictor variables became available, was about 1 in 1000. Of course, I had hundreds of thousands of observations so I could make a very good estimate of the center with a very tight variance, but that could have overwhelmed the predictors if they were far away from the group mean.

So, I placed a Beta(1,999) distribution as my informative prior density for the failure rate. You are doing something slightly different, but if you do have outside data on the incidence of the different values of the dependent variable, or if you want to use an empirical prior, then you could estimate the rate by adding significantly to the variance. You cannot use a beta distribution because of the use of logistic regression

You will also need to rescale the probabilities into log-odds and reverse engineer the prior. So that, as in my example with a binary case, $$\log(.001)-\log(.999)=log(p)-log(1-p)=m\bar{x}+b=-2.9996$$ if you treat treated the various predictors as independent. Now there is a center of location. $b$ should probably be some large diffuse value such as $\mathcal{N}(0,1000^2)$ and $m$ needs adjusted for $\bar{x}$. You want the prior on $m$ to be informative as to within the ranges you would consider surprising, but diffuse enough that the prior does not control the outcome.

Just a note, for my own project, I did not use logistic regression. There were enough violations of assumptions that I chose to use a math trick and solve it a different way than logistic regression. My own problem had a convenient natural structure that let me sneak around your headache.

Avoid improper priors, they may not integrate to unity.

As to the cut-off points, you have an ordering of variables, where $p_1$ percent are in category one, $p_2$ percent are in category 2 and so forth. Your cut-off should be centered around where those log probabilities actually sit at in aggregate. Same as above, but with more parameters. A cutoff $c_2$ sits at the boundary between category 2 and category 3 and sits at the point $p_1+p_2$.

Related Question