Solved – Link functions for Binomial Regression

aicbinomial distributiongeneralized linear modellink-functionr

So I have a dataset of presence (1) and absence (0) data, but it mainly consists of 0's (~80% of the 5200 observations). Now while constructing my binomial logistic model I am reading (Zuurt et al. 2009) as a guide. There is only a short description about the different link-choices for a binomial model and throughout the examples the standard logit-link is used. But the book also states that if you have more 0's than 1's, the cloglog-link is also an option.

How can I find out which model is better (just by comparing the AIC?) and is there any good description of the selection proces of these link-functions? Or maybe somebody here can give some advice.

Best Answer

Not sure of the selection process but one way to evaluate is to partition your data into train and test subsets. Luckily, you can do this, it seems, because your models would both be using the same parameters, data, etc. Randomly select, say, 80% of the data and train the two models and then compare how accurately they predict the test subset.

In doing so, the prediction function will give you probabilities. You can round these to zero or ones based on a certain threshold (i.e. if the threshold is 0.7 then if it is greater than 70 % we say it is present (1) or less it is absent (0)). The higher the threshold the greater confidence you can have in the model.Then you would compare what the model predicted to how the test data actually performed and get a percent accuracy.

Related Question