Generalized Linear Model – Why Beta/Dirichlet Regression Are Not Considered GLMs

beta-regressiondirichlet-regressiongeneralized linear model

The premise is this quote from vignette of R package betareg¹.

Further-more, the model shares some properties (such as linear
predictor, link function, dispersion parameter) with generalized
linear models (GLMs; McCullagh and Nelder 1989), but it is not a
special case of this framework (not even for fixed dispersion)

This answer also makes allusion to the fact:

[…] This is a type of regression model that is appropriate when the
response variable is distributed as Beta. You can think of it as
analogous to a generalized linear model. It's exactly what you are
looking for […] (emphasis mine)

Question title says it all: why Beta/Dirichlet Regression are not considered Generalized Linear Models (are they not)?

As far as I know, the Generalized Linear Model defines models built on the expectation of their dependent variables conditional on the independent ones.

$f$ is the link function that maps the expectation, $g$ is probability distribution, $Y$ the outcomes and $X$ the predictiors, $\beta$ are linear parameters and $\sigma^2$ the variance.

$$f\left(\mathbb E\left(Y\mid X\right)\right) \sim g(\beta X, I\sigma^2)$$

Different GLMs impose (or relax) the relationship between the mean and the variance, but $g$ must be a probability distribution in the exponential family, a desirable property which should improve robustness of the estimation if I recall correctly. The Beta and Dirichlet distributions are part of the exponential family, though, so I'm out of ideas.

[1] Cribari-Neto, F., & Zeileis, A. (2009). Beta regression in R.

Best Answer

Check the original reference:

Ferrari, S., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.

as the authors note, the parameters of re-parametrized beta distribution are correlated, so

Note that the parameters $\beta$ and $\phi$ are not orthogonal, in contrast to what is verified in the class of generalized linear regression models (McCullagh and Nelder, 1989).

So while the model looks like a GLM and quacks like a GLM, it does not perfectly fit the framework.

Related Solutions

Solved – Is the understanding of Generalized Linear Models correct

I would say that your understanding still needs some work, because your description is very vague, which tells us that you're unclear as to exactly what to say about GLMs. First, your statement "for machine learning problems we can base our models on different distributions" is somewhat ambigious. For linear regression models this makes more sense; you'll see later in the course that there are many nonlinear metheds that don't follow a pre-defined probability distribution.

Remember that in regression problems, you're modeling the mean of the response variable as a function of the linear combination of predictors. When we perform ordinary least squares, we are restrained to several assumptions - like that the response variable is normally distributed around the mean and that variance of the response is independent of the predictors. This doesn't always hold in reality, so GLMs allow us to relax some of these assumptions by specifying the response variable distribution, a link function, etc. In other words, GLMs are a generalization and extension of least squares; they're still linear regression problems which are only a part of the overall machine learning theme.

Andrew Ng's lecture notes may not be the best introductory source when it comes to this topic. I recommend reading up on some additional sources if you wish to get to know GLMs a bit better, like this chapter from Applied Regression Analysis and GLMs.

Beta Regression – Why Use the Logit Link in Beta Regression

Justification of the link function: A link function $g(\mu): (0,1) \rightarrow \mathbb{R}$ assures that all fitted values $\hat \mu = g^{-1}(x^\top \hat \beta)$ are always in $(0, 1)$. This may not matter that much in some applications, e.g., because the predictions or only evaluated in-sample or are not too close to 0 or 1. But it may matter in some applications and you typically do not know in advance whether it matters or not. Typical problems I have seen include: evaluating predictions new $x$ values that are (slightly) outside the range of the original learning sample or finding suitable starting values. For the latter consider:

library("betareg")
data("GasolineYield", package = "betareg")
betareg(yield ~ batch + temp, data = GasolineYield, link = make.link("identity"))
## Error in optim(par = start, fn = loglikfun, gr = if (temporary_control$use_gradient) gradfun else NULL,  : 
##   initial value in 'vmmin' is not finite

But, of course, one can simply try both options and see whether problems with the identity link occur and/or whether it improves the fit of the model.

Interpretation of the parameters: I agree that interpreting parameters in models with link functions is more difficult than in models with an identity link and practitioners often get it wrong. However, I have also often seen misinterpretations of the parameters in linear probability models (binary regressions with identity link, typically by least squares). The assumption that marginal effects are constant cannot hold if predictions get close enough to 0 or 1 and one would need to be really careful. E.g., for an observation with $\hat \mu = 0.01$ an increase in $x$ cannot lead to a decrease of $\hat \mu$ of, say, $0.02$. But this is often treated very sloppily in those scenarios. Hence, I would argue that for a limited response model the parameters from any link function need to be interpreted carefully and might need some practice. My usual advice is therefore (as shown in the other discussion you linked in your question) to look at the effects for regressor configurations of interest. These are easier to interpret and often (but not always) rather similar (from a practical perspective) for different link functions.

Best Answer

Related Solutions

Solved – Is the understanding of Generalized Linear Models correct

Beta Regression – Why Use the Logit Link in Beta Regression

Related Question