Solved – Understanding the output of C5.0 classification model using the CARET package

caretclassificationfeature selectionmachine learningr

The C5.0 classification model was used in this 4-class problem data with $N_{train}$=165, $P$=11, using caret R-package by running the code below. The winnowing option was tuned over in the model, which is a kind of feature selection approach. This excerpt I quote regarding winnowing from the companion book of caret, a must-have book in my opinion to realize hidden gems coded in the package:
Kuhn M, Johnson K. Applied predictive modeling. 1st edition. New York: Springer. 2013.

C5.0 also has an option to winnow or remove predictors: an initial
algorithm uncovers which predictors have a relationship with the
outcome, and the final model is created from only the important
predictors. To do this, the training set is randomly split in half and
a tree is created for the purpose of evaluating the utility of the
predictors (call this the “winnowing tree”). Two procedures
characterize the importance of each predictor to the model:
1. Predictors are considered unimportant if they are not in any split in the winnowing tree.
2. The half of the training set samples not included to create the winnowing tree are used to estimate the error rate of the tree. The
error rate is also estimated without each predictor and compared to
the error rate when all the predictors are used. If the error rate
improves without the predictor, it is deemed to be irrelevant and is
provisionally removed.

c50Grid <- expand.grid(.trials = c(1:9, (1:10)*10),
                       .model = c("tree", "rules"),
                       .winnow = c(TRUE, FALSE))
c50Grid
set.seed(1) # important to have reproducible results
c5Fitvac <- train(Class ~ .,
                   data = training,
                   method = "C5.0",
                   tuneGrid = c50Grid,
                   trControl = ctrl,
                   metric = "Accuracy", # not needed it is so by default
                   importance=TRUE, # not needed
                   preProc = c("center", "scale"))  
> c5Fitvac$finalModel$tuneValue
   trials model winnow
16     70  tree  FALSE  

CV tuning output:
enter image description here

Excerpt from the C5.0 tree output:

> c5Fitvac$finalModel$tree
[1] "id=\"See5/C5.0 2.07 GPL Edition 2014-01-22\"\nentries=\"70\"\ntype=\"2\" class=\"Q\" freq=\"9,16,60,80\" att=\"IL17A\" forks=\"3\" cut=\"0.92485309\"\ntype=\"0\" class=\"Q\"\ntype=\"2\" class=\"Q\" freq=\"0,4,59,80\" att=\"IL23R\" forks=\"3\" cut=\"0.26331303\"\ntype=\"0\" class=\"Q\"\ntype=\"2\" class=\"Q\" freq=\"0,4,19,80\" att=\"IL12RB2\" forks=\"3\" cut=\"0.41611555\"\ntype=\"0\" class=\"Q\"\ntype=\"2\" class=\"Q\" freq=\"0,4,9,80\" att=\"IL23R\" forks=\   

Now importance of predictors:

> predictors(c5Fitvac )
 [1] "IL23R"   "IL12RB2" "IL8"     "IL23A"   "IL6ST"   "IL12A"   "IL12RB1"
 [8] "IL27RA"  "IL12B"   "IL17A"   "EBI3"

Questions:

  1. Why is it in the plot, the accuracy levels of No-winnowing about two times that of the winnowing? Can you please help interpreting this output when it says winnow = FALSE?
  2. How to visualize the tree output, instead of the computed junk text that appeared in my case? is there any way to behold a tree instead of crowded symbols?

Best Answer

Thanks for the plug =]

1) The winnowing process is erroneously removing predictors that can improve the accuracy of the model. Within the cross-validation loop, the winnowing process thinks that it is improving the accuracy, but that is not holding up once other samples are used to evaluate performance. Sometimes it helps and other times is doesn't

2) There is no graph of the tree yet (but it is on my list). Try using the summary function:

> set.seed(1)
> mod <- train(Species ~ ., data = iris, method = "C5.0")
> ## This data set liked rules over trees but it works the same for trees
> summary(mod$finalModel)

Call:
<snip>
-----  Trial 0:  -----

Rules:

Rule 0/1: (50, lift 2.9)
        Petal.Length <= 1.9
        ->  class setosa  [0.981]

Rule 0/2: (48/1, lift 2.9)
    Petal.Length > 1.9
    Petal.Length <= 4.9
    Petal.Width <= 1.7
    ->  class versicolor  [0.960]
<snip>
Evaluation on training data (150 cases):

Trial           Rules     
-----     ----------------
      No           Errors

   0         4    4( 2.7%)
   1         5    8( 5.3%)
   2         3    6( 4.0%)
   3         6   12( 8.0%)
   4         4    5( 3.3%)
   5         7    3( 2.0%)
   6         3    8( 5.3%)
   7         8   15(10.0%)
   8         4    3( 2.0%)
   9         5    5( 3.3%)
boost             0( 0.0%)   <<


   (a)   (b)   (c)    <-classified as
  ----  ----  ----
    50                (a): class setosa
          50          (b): class versicolor
                50    (c): class virginica


Attribute usage:

100.00% Petal.Length
 66.67% Petal.Width
 54.00% Sepal.Width
 46.67% Sepal.Length


Time: 0.0 secs

HTH,

Max