See the first example given in help for step.plr
n <- 100
p <- 3
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x <- data.frame(x1=factor(z[ ,1]),x2=factor(z[ ,2]),x3=factor(z[ ,3]))
y <- sample(c(0,1),n,replace=TRUE)
fit <- step.plr(x,y)
# 'level' is automatically generated. Check 'fit$level'.
Does that answer your question?
Use with the default grid to optimize parameters and use predict to have the same results:
R2.caret-R2.gbm=0.0009125435
rmse.caret-rmse.gbm=-0.001680319
library(caret)
library(gbm)
library(hydroGOF)
library(Metrics)
data(iris)
# Using caret with the default grid to optimize tune parameters automatically
# GBM Tuning parameters:
# n.trees (# Boosting Iterations)
# interaction.depth (Max Tree Depth)
# shrinkage (Shrinkage)
# n.minobsinnode (Min. Terminal Node Size)
metric <- "RMSE"
trainControl <- trainControl(method="cv", number=10)
set.seed(99)
gbm.caret <- train(Sepal.Length ~ .
, data=iris
, distribution="gaussian"
, method="gbm"
, trControl=trainControl
, verbose=FALSE
#, tuneGrid=caretGrid
, metric=metric
, bag.fraction=0.75
)
print(gbm.caret)
caret.predict <- predict(gbm.caret, newdata=iris, type="raw")
rmse.caret<-rmse(iris$Sepal.Length, caret.predict)
print(rmse.caret)
R2.caret <- cor(gbm.caret$finalModel$fit, iris$Sepal.Length)^2
print(R2.caret)
#using gbm without caret with the same parameters
set.seed(99)
gbm.gbm <- gbm(Sepal.Length ~ .
, data=iris
, distribution="gaussian"
, n.trees=150
, interaction.depth=3
, n.minobsinnode=10
, shrinkage=0.1
, bag.fraction=0.75
, cv.folds=10
, verbose=FALSE
)
best.iter <- gbm.perf(gbm.gbm, method="cv")
print(best.iter)
train.predict <- predict.gbm(object=gbm.gbm, newdata=iris, 150)
rmse.gbm<-rmse(iris$Sepal.Length, train.predict)
print(rmse.gbm)
R2.gbm <- cor(gbm.gbm$fit, iris$Sepal.Length)^2
print(R2.gbm)
print(R2.caret-R2.gbm)
print(rmse.caret-rmse.gbm)
Best Answer
My experience across a bunch of data sets (some of which are documented in section 14.7 of APM) is that it doesn't change performance in any one direction (i.e. in some changes it is betters, worse in others). I have yet to see a huge difference.
However, most tree based models have an algorithm that, when given a categorical predictor, find the optimal binary split. A lot of these look at different configurations of how to split the category (e.g. 2 values on one side, 3 on the other). If you have dummy variables, it only considers one value of that predictor at a time. Even though it has more predictors to sift through, I find that using dummy variables makes the training time shorter and the trees slightly deeper.
Max