Solved – Predict function and categorical variables in R

categorical datapredictive-modelsrzero inflation

This is more of a general question about how the predict function treats categorical variables and how to interpret the output from predict.

I have a zeroinfl model to predict the number of animals encountered:

b9 <- zeroinfl(Count ~ as.factor(Area) + as.factor(Season) | 1, dist="negbin", data=total)

where Count is the number of animals and the explanatory variables are Area and Season, both are coded as factors in the model. Area has three levels and Season has 4 levels. Coding the two as factors allows for R to create dummy variables for each variable for use in the model.

When I use the predict function to predict the number of animals for a larger data set, I want to make sure I'm understanding what is happening.

My newdata for predict is: newdata<- as.data.frame(Season, Area).
Both variables are coded as factors and the dataframe is in a long format. There are records for each combination of Season and Area that correspond to trips taken over the course of 8 years. There are 113,804 rows of data in the newdata data,frame.

str(newdata)
'data.frame':   113804 obs. of  2 variables:
$ Season: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
$ Area  : Factor w/ 3 levels "625","631","Bay": 3 3 3 3 3 3 3 3 3 3 ...

Example:

   Season Area
1       1  Bay
2       1  625
3       1  631
4       2  Bay
5       2  625
6       2  631
7       3  Bay
8       3  625
9       3  631
10      4  Bay
11      4  625
12      4  631
  1. Do I need to create dummy variables for all levels for the two variables for input into predict, or does predict function act like zeroinfl where if the variables are coded as factors, this is done automatically by R.

  2. Predict returns values of 0.0461 – 0.6015. If I am trying to predict the number of animals how do I interpret this? Since no predicted values are greater than 1 and I need whole numbers, I rounded the predicted data so that any value less than 0.5 was equal to 0 and any value greater than 0.5 was equal to 1. Does this seem correct?

Best Answer

?predict.zeroinf

type = c("response", "prob", "count", "zero")

You need to add type="count" as an argument to your predict