Solved – How to choose the link function when performing a logistic regression

link-functionlogisticregressionsas

I am running a logistic model. In SAS Entreprise Miner, I noticed there's a link function that has three possible options: logit, probit and cll (complementary log-log).

Can you please shed light on the following questions:

  1. Can we use any of these link function to carry out a logistic regression?
  2. Are there situations where one would be better than others?
  3. Is it intuitively possible to get some insight about which kind of function could be useful in which situation? (By just looking at the formula, complementary log-log function might be good for normalization of data when data does not depart (too much) from a normal distribution.)

Any additional pointers would be greatly appreciated.

Best Answer

I don't know of SAS, so i'll just answer based on the statistics side of the question. About the software you mays ask at the sister site, stackoverflow.

  1. If the link function is different (logistic, probit or Clog-log), than you will get different results. For logistic, use logistic.

  2. About the real differences of these link functions.

Logistic and probit are pretty much the same. To see why they are pretty much the same, remember that in linear regression the link function is the identity. In logistic regression, the link function is the logistic and in the probit, the normal. Formally, you can see this by noting that, in case your dependent variable is binary, you can think of it as following a Bernoulli distribution with a given probability of success. $Y \sim Bernoulli(p_{i})$

$p_{i} = f(\mu)$

$\mu = XB$

Here, thew probabitliy $p_{i}$ is a function of predictor, just like in linear regression. The real difference is the link function. In linear regression, the link function is just the identity, i.e., $f(\mu) = \mu$, so you can just plug-in the linear predictors.In the logistic regression, the link function is the cumulative logistic distribution, given by $1/(1+exp(-x)). In the probit regression, the link function is the (inverse) cumulative Normal distribution function. And in the Clog-log regression, the link function is the complementary log log distribution.

I never used the Cloglog, so i'll abstein of coments about it here.

You can see that Normal and Logist are very similar in this blog post by John Cook, of Endeavour http://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/.

In general I use the logistic because the coefficients are easier to interpret than in a probit regression. In some specific context I use probit (ideal point estimation or when I have to code my own Gibbs Sampler), but I guess they are not relevant to you. So, my advice is, whenever in doubt about probit or logistic, use logistic!