Solved – Why is generalized linear model (GLM) a semi-parametric model

generalized linear model

As we all know , the GLM has the structure: $G(EY)=X^{T}\beta$, in which $G(.)$ is a known
link function. What confuse me is that some people say that it's a semiparametric model. But in my opinion, it's a parametric model, because there is no nonparametric part and all we don't know in the previuos structure is $\beta$.

Could anybody tell me the reason why someone call it a semiparametric model. Thanks!

Best Answer

A GLM isn't a semi-parametric model, but the output from typical use of GLMs can be justified with only semi-parametric assumptions.

If one only assumes that the observations $Y_1, Y_2, ... Y_n$ are independent and that $$ g(\mathbb{E}[\,Y_i|X_i=x_i\,]) = x_i^T\beta $$ then, under mild regularity conditions, solving the equations $$ \sum_i\frac{\partial g^{-1}(x_i^T\beta)}{\partial \beta}w(g^{-1}(x_i^T\beta))(Y_i - g^{-1}(x_i^T\beta)) = \mathbf{0} $$ provides consistent estimates for parameter $\beta$. The weighting term $w$ is arbitrary, but it determines the efficiency of this approach, and the best option is to use weights inversely proportional to the variance of $Y_i$, if you know this.

How does this connect to GLMs? Well, the estimating equation above is just the score equation (i.e. the one that determines the MLE), under the assumption of a GLM. A particularly simple case of thise is when we use the "canonical" link function, chose so that part of the derivative term cancels with the inverse-variance weights, and we get $$ \sum_i x_i(Y_i - g^{-1}(x_i^T\beta)) = \mathbf{0}, $$ which should look familiar to anyone who's studied linear regression, or logistic regression, or Poisson regression.

In general, we can view the point estimates from GLMs as MLEs under a particular fully parametric model for $Y$, or as consistent & efficient estimates resulting from assumptions on only the first and second moments of $Y$ - i.e. a semi-parametric model.

Similar arguments apply to the confidence intervals these methods provide; see e.g. McCullagh and Nelder's book for the details.

Related Solutions

Interaction in Generalized Linear Model – SPSS Guide

In general, the existence of an interaction means that the effect of one variable depends on the value of the other variable with which it interacts. If there isn't an interaction, then the value of the other variable doesn't matter.

This is easiest to understand in the case of linear regression. Imagine we are looking at the adult height (say at 25) of a child based on the adult height of the father. We further include sex as an additional predictor variable, because men and women differ considerably in adult height. Let's imagine that there is no interaction between these two variables (which may be true, at least to a first approximation). We could then plot our model simply as two lines on a scatterplot of the data. We may want to use different colors or symbols / line styles for men vs. women, but at any rate we would see a football-ish (or rugby-ball-ish, depending on where you live) shaped cloud of points with two parallel lines going through it. The important part is that the lines are parallel; if someone asked you what the effect would be of the father being 1 inch (1 cm) taller, you would respond with $\beta_{\text{height}}$. If they further asked you what the effect would be if the child were male or female, you would respond, 'that doesn't matter, you would expect them to be $\beta_{\text{height}}$ taller as an adult either way'. That is because the lines are parallel (with the same slope, $\beta_{\text{height}}$) / there is no interaction.

Now imagine the case of anxiety on test taking performance when examining two populations: emotionally stable vs. emotionally unstable people. Lets imagine that there is an interaction such that emotionally unstable people are more strongly affected by anxiety. Then, if we plotted the model similarly, we would see two lines that are not parallel. One line (representing emotionally stable individuals) might be sloping downward gradually, while the other line (representing unstable students) might move downward much more quickly. If we had used reference cell coding, with the stable individuals as the reference category, the fitted regression model might be: $$ \text{test performance}=\beta_0 + \beta_1\text{anxiety} + \beta_2\text{unstable} + \beta_3\text{anxiety}*\text{unstable} $$ In such a case, the slope of the first line would be $\beta_\text{anxiety}$ (since $\text{unstable}$ would equal 0), but the slope of the second line would be $\beta_1+\beta_3$. If someone asked you how much test taking performance would be impaired if anxiety went up by one unit, you would have to say, 'that depends, emotionally stable students would score $\beta_1$ points lower, but emotionally unstable individuals would drop by $\beta_1+\beta_3$ points'.

This is the essence of what an interaction is. In addition, these examples illustrate the necessity of interpreting simple effects only when interactions exist, and the value of using plots of your model to facilitate understanding.

With a generalized linear model, the situation is essentially the same, but you may have to take into account the additional complexity of the link function (a non-linear transformation), depending on which scale you want to use to make your interpretation. Consider the case of logistic regression, there are (at least) three scales available: The betas exist on the logit (log odds) scale, whereas $\pi$ (the probability of 'success') exists only in the interval $(0,1)$ and behaves quite differently; in addition, the odds lie between them. So you need to chose which of these you want to use to interpret your model. For example, with respect to the log odds, the model is linear, and everything can be understood just as above.

If you were using the odds, you can get odds ratios by exponentiating your betas. For example, if there is no interaction, the odds ratio associated with a one unit increase in $X_1$ is $\exp(\beta_1)$. This would also be the odds ratio of the reference category (like the emotionally stable students above) if there were an interaction with a dichotomous variable, but the contrasting category would be associated with an odds ratio of $\exp(\beta_1)*\exp(\beta_2)$.

Unfortunately, neither of those are very intuitively accessible for people, and the non-linear transformation (the link function) makes life more complicated. It is important to recognize that this isn't specific to interactions; the change in the probability of 'success' associated with increasing $X$ by one unit is never the same as (say) decreasing $X$ by one unit (except in the special case where $x_i$ is associated with $\pi=.5$). In other words, the change in probability associated with a one unit change in $X$ depends on where you are starting from (in this sense, you could perhaps metaphorically say that it interacts with itself). The best way to determine the change in probability associated with moving from one level of $X$ to another, is to plug in those levels, solve the regression equation for $\hat\pi$, and then subtract. The same thing is true if you have more than one variable, but no 'interaction' with the variable in question. This isn't anything special, it's just that 'where you are starting from' depends on the other variables as well. Again, the best way to determine the change in probability would be to solve for $\hat\pi$ at both places and subtract.

Interactions in a GLiM should also be treated similarly. It is best not to interpret interaction effects, but only simple effects (that is, the effect of $X_1$ on $Y$ holding $X_2$ constant). In addition, it's best to overlay plots of the predicted values (say, when $X_2=0$ vs. when $X_2=1$) on a scatterplot of your data. Now, for a logistic regression, it is often difficult to get a decent plot of your data as the points are all 0's and 1's, so you might just choose to leave them out. Nonetheless, a plot of the two curves will typically be the best thing to use. After you have the plot, a qualitative (verbal) description is often easy (e.g., 'probabilities don't start moving away from 0 until larger levels of $X_1$, and even then, raise more slowly').

Your situation is perhaps a little more complicated than this, because you have two continuous variables, rather than a continuous and a dichotomous one. However, this isn't a problem. Typically in this situation, people will be thinking primarily in terms of one of the predictor variables; then you can plot the relationship between that variable and $Y$ at several levels of the other predictor. If there are theoretically meaningful levels, you could use those, if not, you could use the mean and +/- 1 SD. If you didn't have a preference for one of the variables, you could flip a coin, or plot it both ways and see which will be easier to work with.

I don't know if / how SPSS will let you make those plots, but if you aren't able to find a way, they should be easy to make manually in Excel.

Solved – Multiplicative error and additive error for generalized linear model

With GLMs, it's generally best not to think of them as "conditional mean + error" -like models but as "conditional distribution" models.

In the case of the Gamma model, note that the variance is proportional to the square of the mean. If you really want to write an error term and you have a log link, you can either write it as an additive error model on the log-scale (with constant variance) or on the original scale as a multiplicative error model (but with changing variance). I wouldn't do it as an additive model on the original scale.

The log of a gamma random variable isn't at all bad to deal with, so the additive error version is kind of convenient, if you want to deal with an error-model.

Beware, however -- the additive error version results in a term with a non-zero mean (it's also left skew, but that's less of a big deal). It's easy enough to compute an adjustment for that non-zero mean, though, so that you can correct for bias on the log-scale (or you could even fit a least-squares model to the logs and compute an adjustment for the original scale).

It seems that the "quasi-likelihood" must be applied in order to solve the models for lognormal models without log-transform.

In fact, if you want ML estimation, taking logs is basically the most sensible way to estimate the parameters of that log-normal model.

To try to do that on the original scale is just making your life hard.

See here, starting at "The MLE is also invariant with respect to certain transformations of the data." down to "For example, the MLE parameters of the log-normal distribution are the same as those of the normal distribution fitted to the logarithm of the data."

Note that lognormal and gamma models aren't the only models which would be suitable for a model that's linear in the logs, and which has constant variance on the log-scale, they just happen to both be quite convenient.

Best Answer

Related Solutions

Interaction in Generalized Linear Model – SPSS Guide

Solved – Multiplicative error and additive error for generalized linear model

Related Question