If I have a factor e.g. sexe with two levels MALE and FEMELLE let's say, using rpart alone I get splits that say for example Sexe = Male and then a yes no split. However using rpart with caret I get a weird renaming of variables:
this also causes a problem with the predict function as now my variable isn't called sexe anymore but sexeMALE. Is there a way around this? Also it's a factor variable what does >=.5 mean in this case?
Thanks
Best Answer
You probably used the formula method with
train
which converts the factors to dummy variables. Most functions in R that use the formula method do the same.rpart
,randomForest
,naiveBayes
and a few others do not since they are able to model the categories without needing numeric encodings of that data.The naming that you see is what is generated by
model.matrix
.If you want to keep the factors as factors, use the non-formula method, e.g.
Max