Specifically, are there any binomial regression models that use a kernel with heavier tails and higher kurtosis than the standard kernels (logistic/probit/cloglog)?
As a function of the linear predictor $\textbf{x}'\mathbf{\hat{\beta}}$, the logistic distribution
- Underestimates the probability of my data being in the tails of the distribution
- Underestimates the kurtosis, or clustering of data, in the middle of the distribution:
This can be seen from a diagnostic plot of my fit:
- The red line is the logistic CDF, representing a perfect fit
- The black line represents the fitted probabilities from my dataset (calculated by binning observations into 0.1 intervals of $\textbf{x}'\mathbf{\hat{\beta}}$, where $\mathbf{\hat{\beta}}$ is obtained from my fit)
- The grey bars in the background represent number of observations on which the true probabilities are based upon
- The grey areas are where the tail 10% of the data lie (5% each side).
Ideally, any solution would use R.
Edit
Why am I talking about CDFs? Our GLM equation is:
$$\mathbb{P}(Y = 1) = \mathbb{E}[Y] = g^{-1}(\textbf{x}'\mathbf{\beta})$$
Where $g$ is the link function.
Further, if $g^{-1}$ is a valid probability distribution (i.e. monotonically increasing from 0 to 1, indeed the case with probit, logit, cloglog), then consider a latent (not directly observed) continuous random variable $Y^{*}$ whose distribution (CDF) is given by $g^{-1}$. Then by definition
$$\mathbb{P}(Y^{*} \leq \textbf{x}'\mathbf{\beta}) = g^{-1}(\textbf{x}'\mathbf{\beta})$$
Equating the two equations above, we see the probability of $Y=1$ is exactly equal to the CDF of $Y^{*}$
$$\mathbb{P}(Y = 1) = \mathbb{P}(Y^{*} \leq \textbf{x}'\mathbf{\beta})$$
Hence I talk interchangeably about the expected response $\mathbb{E}[Y]$ and CDF of $Y^{*}$ over linear-predictor ($\textbf{x}'\mathbf{\hat{\beta}}$) space.
Best Answer
You can generalise the logistic regression model so that the latent distribution is something other than logistic. Using the t distribution lets you capture relationships where the data are contaminated, meaning observations from the "wrong" class appear unexpectedly far away from the decision boundary -- this would be the binary equivalent of fat tails in regression with a continuous response. IOW, you model
$$ \begin{align} y_i & = \begin{cases} 1 & \text{if } z_i > 0 \\ 0 & \text{if } z_i < 0 \end{cases}\\ z_i & = X_i\beta + \epsilon_i \end{align}$$
where the latent errors $\epsilon$ are distributed as $$\epsilon_i \sim t_\nu \left(0, \frac{\nu-2}{\nu} \right) $$
with $\nu > 2$ estimated from the data.
This is also called robit regression. In particular see pp.124-125 of Gelman & Hill's Data Analysis Using Regression And Multilevel/Hierarchical Models (what a mouthful); the above is eqn 6.15 in the book. You can fit it in R using package
stan
(which is also by Andrew Gelman and his collaborators and goes with the book),list
and maybe others. You could also write code to maximise the log-likelihood directly usingoptim
ornlminb
; I haven't tried it, but it shouldn't be hard.