Solved – Should predictions with negative binomial regression only produce integers

correlationcount-datanegative-binomial-distributionrregression

I have a dataset consisting of about 600 observations. Each observation has around 100 attributes. One of the attributes I want to predict. Since the attribute that I want to predict can only have non-negative integer values, I was looking for ways to predict count data and found that there are various options, such as Poisson regression or negative binomial regression.

For my first try I used negative binomial regression in R:

#First load the data into a dataset
dataset <- test_observations[, c(5:8, 54)]

#Create the model
fm_nbin <- glm.nb(NumberOfIncidents ~ ., data = dataset[10:600, ] )

I then wanted to see how to predicted values look like:

#Create data to test prediction
newdata <- dataset[1:10, ]

#Do the prediction
predict(fm_nbin, newdata, type="response")

Now the problem is the output looks like this:

     1         2         3         4         5         6         7         8         9        10 
0.2247337 0.2642789 0.2205408 0.2161833 0.1794224 0.2081522 0.2412996 0.2074992 0.2213011 0.2100026 

The problem with this is that I expected that the predicted values are integers, since that is the whole purpose of using a negative binomial regression. What am I missing here?

Furthermore, I would like to evaluate my predictions in terms of mean squared error and mean absolute error, as well as a correlation coefficient. However, I couldn't find a way to get these easily, without doing all the calculations manually. Is there any built-in function for this?

Best Answer

Don't forget that glm's model E[y|X]. Therefore, the predict function for glm.nb is giving you E[y|X]. The standard example of a non-integer expected value is rolling a die. You can only get integers as outcomes. However, the expected value (3.5) is not an integer.

As for the mean squared error, checkout Hans Roggeman's answer here. It helped me understand model comparison better.