Solved – Why do we assume the exponential family in the GLM context

exponential-familygeneralized linear model

When I first learned about Generalized Linear Models I thought that the assumption that the dependent variable follows some distribution from the exponential family was made to simplify calculations. However, I now read about Vector GLMs (VGLMs). VGLMs do not require the assumption that the dependent variable follows some distribution from the exponential family but they allow for a much broader set of distributions.

So my question is: WHY do we actually need the distribution assumption in GLMs?

My thoughts so far: GLMs model the mean of the assumed exponential family and thus has only one predictor (this predictor may be vector-valued in case of a vector-valued distribution mean). The variance of the distribution depends on the mean by some function and the first two moments specify the distribution uniquely within the set of all distributions from the exponential family. Thus, it is enough to specify the link function to uniquely specify the distribution. VGLMs on the other hand allow more than one predictor, one predictor for each parameter. It is therefore possible to specify the distribution by first assuming the distribution of the dependent variable and then estimate the parameters. Consider for instance the negative binomial distribution $NB(r,\mu)$. The two parameters are and $r$ (number of trials) and the mean $\mu$ (note that in this formulation $p=\frac{\mu}{\mu+r}$). Can someone verify these thoughts or give another explanation?

Best Answer

When I discovered GLM I also wondered why it was always based on the exponential family. I have never answered to that question clearly. But...

I call $h$ the reciprocal of the link function. $\beta$ the parameter.

When I first learned about Generalized Linear Models I thought that the assumption that the dependent variable follows some distribution from the exponential family was made to simplify calculations.

Yes. I used it with stochastic gradient descent (SGD), and the update rule of SGD (the gradient) is made especially simple in the canonical GLM case. See http://proceedings.mlr.press/v32/toulis14.pdf prop 3.1 and paragraph 3.1. Finally it all works in a way that is similar to least squares (minimize average $(Y-h(\beta X))^2$) but even simpler. The interpretation of the update rule is made quite simple. For some sample $(x,y)$ :

compute what you expect as a mean for $y$ (that is $h(\beta x)$)
compare it with real observed $y$,
correct you parameter $\beta$ proportionality to the difference (and $x$)

Without the exp family and canonical link, the error would be multiplied by something dependant on $x$ (and maybe $y$). It would be a sort of refinement of the basic idea : varying the intensity of the correction. It gives different weights to the samples. With least square, you have to multiply by $h'(\beta x)$. Some practical tests of mine in a case with a lot of data showed it was less good (for reasons I'm incapable of explaining).

Thus, it is enough to specify the link function to uniquely specify the distribution.

Yes again.

Also pre-existing logistic regression and Poisson regression fit into the canonical GLM framework. Probably one more (historical) explanation of using the exp family + canonical link.

Maybe, "why assume the exp family in GLM" is similar to "why assume a normal noise in linear regression". For theoretical good properties and simple calculations... But does it always matter so much in practice ? Real data rarely have normal noise in cases when linear regression still works very well.

What was fundamentally useful (for me) about GLM is the difference with transformed linear regression :

Transformed linear regression : $E(h^{-1}(Y))=\beta X$
GLM : $E(Y)=h(\beta X)$

This changes everything :

Transformed linear regression : the estimation of the mean of $h^{-1}(Y)$ (conditionally to any function of $X$) is unbiased.
GLM : the estimation of the mean of $Y$ (conditionally to any function of $X$) is unbiased.

I'm not familiar with VGLM so I can't answer about it.

Best Answer

Related Solutions

Solved – the rationale behind the exponential family of distributions

Solved – Advantages of the Exponential Family: why should we study it and use it

Related Question