Solved – How to interpret the output of a multinomial classification model in R package gbm

boostingcategorical datamultinomial-distributionpredictionr

After running a gradient boosted model with n data points using multinomial regression where the response variable (a factor, as required by the gbm function) has k levels with R package gbm, I see that the predictions are output as as a vector of length n*k. Predicted responses are from:

probs.var.multinom <- predict.gbm(gbm.model.multinom, test.data, best.iter.gbm, 
                                  type="response")

Note that this is different from the output of a logistic (distribution = "bernoulli") model, where the results are a vector the same length as the number of cases.

How should this be interpreted? Specifically, how can I link the response vector back to the input data set to evaluate the classification?

Best Answer

The predictions are not returned as a vector, but as an array. If you use: pred=as.matrix(probs.var.multinom[,,1]), you'll get a matrix containing the probabilistic forecasts of the gbm model. Each forecast (each row) consists of a vector of probabilities for each class (each column) of the output variable. You can then consider the most likely class to be your prediction (if you need a single "best-guess" prediction), compare with the true labels, and then use misclassification rate for each class to evaluate the model, or use metrics such as the Rank Probability Skill Score (RPSS) if you want to use the probabilities associated with the forecasts.

Related Solutions

R gbm – How to Use R gbm with Distribution=”adaboost”?

The adaboost method gives the predictions on logit scale. You can convert it to the 0-1 output:

gbm_predicted<-plogis(2*gbm_predicted)

note the 2* inside the logis

Solved – JAGS Multinomial mixture model with missing data

You can't use missing data with the multinomial distribution

See e.g. here from the Patuxent folk for some relevant coding work-arounds

http://www.mbr-pwrc.usgs.gov/workshops/unmarked/Slides/Slides_Multimix.pdf

Best Answer

Related Solutions

R gbm – How to Use R gbm with Distribution=”adaboost”?

Solved – JAGS Multinomial mixture model with missing data

Related Question