Bayesian GLM – Understanding $p$-Values in Bayesian GLM

bayesianp-valuer

I am trying to run a Bayesian logit on the data here. I am using bayesglm() in the arm package in R. The coding is straightforward enough:

df = read.csv("http://dl.dropbox.com/u/1791181/bayesglm.csv", header=T)
library(arm)
model = bayesglm(PASS ~ SEX + HIGH, family=binomial(link="logit"), data=df)

summary(model) gives the following output:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.10381    0.10240   1.014    0.311    
SEXMale      0.02408    0.09363   0.257    0.797    
HIGH        -0.27503    0.03562  -7.721 1.15e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2658.2  on 1999  degrees of freedom
Residual deviance: 2594.3  on 2000  degrees of freedom
AIC: 2600.3

Please walk me through this. I understand that this code uses a very weak prior (since I am not specifying the prior means) so the output is going to be practically the same if I used glm() instead of bayesglm(). But the output should still be in the Bayesian spirit, right? What are the $p$-values and $z$-values here? Aren't these frequentist inference tools? Are they interpreted differently here?

Best Answer

Great question! Although there are Bayesian p-values, and one of the authors of the arm package is an advocate, what you are seeing in your output is not a Bayesian p-value. Check the class of model

class(model)
"bayesglm" "glm"      "lm"

and you can see that class bayesglm inherits from glm. Furthermore, examination of the arm package shows no specific summary method for a bayesglm object. So when you do

summary(model)

you are actually doing

summary.glm(model)

and getting frequentist interpretation of the results. If you want a more Bayesian perspective the function in arm is display()

Related Solutions

Solved – Help me understand Bayesian prior and posterior distributions

Let me first explain what a conjugate prior is. I will then explain the Bayesian analyses using your specific example. Bayesian statistics involve the following steps:

Define the prior distribution that incorporates your subjective beliefs about a parameter (in your example the parameter of interest is the proportion of left-handers). The prior can be "uninformative" or "informative" (but there is no prior that has no information, see the discussion here).
Gather data.
Update your prior distribution with the data using Bayes' theorem to obtain a posterior distribution. The posterior distribution is a probability distribution that represents your updated beliefs about the parameter after having seen the data.
Analyze the posterior distribution and summarize it (mean, median, sd, quantiles, ...).

The basis of all bayesian statistics is Bayes' theorem, which is

$$ \mathrm{posterior} \propto \mathrm{prior} \times \mathrm{likelihood} $$

In your case, the likelihood is binomial. If the prior and the posterior distribution are in the same family, the prior and posterior are called conjugate distributions. The beta distribution is a conjugate prior because the posterior is also a beta distribution. We say that the beta distribution is the conjugate family for the binomial likelihood. Conjugate analyses are convenient but rarely occur in real-world problems. In most cases, the posterior distribution has to be found numerically via MCMC (using Stan, WinBUGS, OpenBUGS, JAGS, PyMC or some other program).

If the prior probability distribution does not integrate to 1, it is called an improper prior, if it does integrate to 1 it is called a proper prior. In most cases, an improper prior does not pose a major problem for Bayesian analyses. The posterior distribution must be proper though, i.e. the posterior must integrate to 1.

These rules of thumb follow directly from the nature of the Bayesian analysis procedure:

If the prior is uninformative, the posterior is very much determined by the data (the posterior is data-driven)
If the prior is informative, the posterior is a mixture of the prior and the data
The more informative the prior, the more data you need to "change" your beliefs, so to speak because the posterior is very much driven by the prior information
If you have a lot of data, the data will dominate the posterior distribution (they will overwhelm the prior)

An excellent overview of some possible "informative" and "uninformative" priors for the beta distribution can be found in this post.

Say your prior beta is $\mathrm{Beta}(\pi_{LH}| \alpha, \beta)$ where $\pi_{LH}$ is the proportion of left-handers. To specify the prior parameters $\alpha$ and $\beta$, it is useful to know the mean and variance of the beta distribution (for example, if you want your prior to have a certain mean and variance). The mean is $\bar{\pi}_{LH}=\alpha/(\alpha + \beta)$. Thus, whenever $\alpha =\beta$, the mean is $0.5$. The variance of the beta distribution is $\frac{\alpha\beta}{(\alpha + \beta)^{2}(\alpha + \beta + 1)}$. Now, the convenient thing is that you can think of $\alpha$ and $\beta$ as previously observed (pseudo-)data, namely $\alpha$ left-handers and $\beta$ right-handers out of a (pseudo-)sample of size $n_{eq}=\alpha + \beta$. The $\mathrm{Beta}(\pi_{LH} |\alpha=1, \beta=1)$ distribution is the uniform (all values of $\pi_{LH}$ are equally probable) and is the equivalent of having observed two people out of which one is left-handed and one is right-handed.

The posterior beta distribution is simply $\mathrm{Beta}(z + \alpha, N - z +\beta)$ where $N$ is the size of the sample and $z$ is the number of left-handers in the sample. The posterior mean of $\pi_{LH}$ is therefore $(z + \alpha)/(N + \alpha + \beta)$. So to find the parameters of the posterior beta distribution, we simply add $z$ left-handers to $\alpha$ and $N-z$ right-handers to $\beta$. The posterior variance is $\frac{(z+\alpha)(N-z+\beta)}{(N+\alpha+\beta)^{2}(N + \alpha + \beta + 1)}$. Note that a highly informative prior also leads to a smaller variance of the posterior distribution (the graphs below illustrate the point nicely).

In your case, $z=2$ and $N=18$ and your prior is the uniform which is uninformative, so $\alpha = \beta = 1$. Your posterior distribution is therefore $Beta(3, 17)$. The posterior mean is $\bar{\pi}_{LH}=3/(3+17)=0.15$. Here is a graph that shows the prior, the likelihood of the data and the posterior

The prior, the likelihood of the data and the posterior distribution with a uniform prior

You see that because your prior distribution is uninformative, your posterior distribution is entirely driven by the data. Also plotted is the highest density interval (HDI) for the posterior distribution. Imagine that you put your posterior distribution in a 2D-basin and start to fill in water until 95% of the distribution are above the waterline. The points where the waterline intersects with the posterior distribution constitute the 95%-HDI. Every point inside the HDI has a higher probability than any point outside it. Also, the HDI always includes the peak of the posterior distribution (i.e. the mode). The HDI is different from an equal tailed 95% credible interval where 2.5% from each tail of the posterior are excluded (see here).

For your second task, you're asked to incorporate the information that 5-20% of the population are left-handers into account. There are several ways of doing that. The easiest way is to say that the prior beta distribution should have a mean of $0.125$ which is the mean of $0.05$ and $0.2$. But how to choose $\alpha$ and $\beta$ of the prior beta distribution? First, you want your mean of the prior distribution to be $0.125$ out of a pseudo-sample of equivalent sample size $n_{eq}$. More generally, if you want your prior to have a mean $m$ with a pseudo-sample size $n_{eq}$, the corresponding $\alpha$ and $\beta$ values are: $\alpha = mn_{eq}$ and $\beta = (1-m)n_{eq}$. All you are left to do now is to choose the pseudo-sample size $n_{eq}$ which determines how confident you are about your prior information. Let's say you are very sure about your prior information and set $n_{eq}=1000$. The parameters of your prior distribution are thereore $\alpha = 0.125\cdot 1000 = 125$ and $\beta = (1 - 0.125)\cdot 1000 = 875$. The posterior distribution is $\mathrm{Beta}(127, 891)$ with a mean of about $0.125$ which is practically the same as the prior mean of $0.125$. The prior information is dominating the posterior (see the following graph):

The prior, the likelihood of the data and the posterior distribution with strong informative prior

If you are less sure about the prior information, you could set the $n_{eq}$ of your pseudo-sample to, say, $10$, which yields $\alpha=1.25$ and $\beta=8.75$ for your prior beta distribution. The posterior distribution is $\mathrm{Beta}(3.25, 24.75)$ with a mean of about $0.116$. The posterior mean is now near the mean of your data ($0.111$) because the data overwhelm the prior. Here is the graph showing the situation:

The prior, the likelihood of the data and the posterior distribution with beta prior corresponding to a pseudo-sample size of 3

A more advanced method of incorporating the prior information would be to say that the $0.025$ quantile of your prior beta distribution should be about $0.05$ and the $0.975$ quantile should be about $0.2$. This is equivalent of saying that your are 95% sure that the proportion of left-handers in the population lies between 5% and 20%. The function beta.select in the R package LearnBayes calculates the corresponding $\alpha$ and $\beta$ values of a beta distribution corresponding to such quantiles. The code is

library(LearnBayes)

quantile1=list(p=.025, x=0.05)     # the 2.5% quantile should be 0.05
quantile2=list(p=.975, x=0.2)      # the 97.5% quantile should be 0.2
beta.select(quantile1, quantile2)

[1]  7.61 59.13

It seems that a beta distribution with paramters $\alpha = 7.61$ and $\beta=59.13$ has the desired properties. The prior mean is $7.61/(7.61 + 59.13)\approx 0.114$ which is near the mean of your data ($0.111$). Again, this prior distribution incorporates the information of a pseudo-sample of an equivalent sample size of about $n_{eq}\approx 7.61+59.13 \approx 66.74$. The posterior distribution is $\mathrm{Beta}(9.61, 75.13)$ with a mean of $0.113$ which is comparable with the mean of the previous analysis using a highly informative $\mathrm{Beta}(125, 875)$ prior. Here is the corresponding graph:

The prior, the likelihood of the data and the posterior distribution with prior that has 0.05 and 0.975 quantiles of 0.05 and 0.2

See also this reference for a short but imho good overview of Bayesian reasoning and simple analysis. A longer introduction for conjugate analyses, especially for binomial data can be found here. A general introduction into Bayesian thinking can be found here. More slides concerning aspects of Baysian statistics are here.

Bayesian p-Values – Understanding and Application

If I understand it correctly, then a Bayesian p-value is the comparison of a some metric calculated from your observed data with the same metric calculated from your simulated data (being generated with parameters drawn from the posterior distribution).

In Gelmans words: "From a Bayesian context, a posterior p-value is the probability, given the data, that a future observation is more extreme (as measured by some test variable) than the data"

For example, the number of zeros generated from a poisson based model could be such a metric or test statistic, and you could calculate how many of your simulated datasets have a larger fraction of zeros than you actually observe in your real data. The closer this value to 0.5, the better the values calculated from your simulated data distribute around the real observation.

Best Answer

Related Solutions

Solved – Help me understand Bayesian prior and posterior distributions

Bayesian p-Values – Understanding and Application

Related Question