Solved – Rpart using Caret changes names of Factors

caretrrpart

If I have a factor e.g. sexe with two levels MALE and FEMELLE let's say, using rpart alone I get splits that say for example Sexe = Male and then a yes no split. However using rpart with caret I get a weird renaming of variables:

enter image description here

this also causes a problem with the predict function as now my variable isn't called sexe anymore but sexeMALE. Is there a way around this? Also it's a factor variable what does >=.5 mean in this case?

Thanks

Best Answer

You probably used the formula method with train which converts the factors to dummy variables. Most functions in R that use the formula method do the same. rpart, randomForest, naiveBayes and a few others do not since they are able to model the categories without needing numeric encodings of that data.

The naming that you see is what is generated by model.matrix.

If you want to keep the factors as factors, use the non-formula method, e.g.

train(x, y)

Max