Solved – Predicted values from gbm.fit and gbm differ

boostingmachine learningpredictive-modelsr

My intuition is that the fitted values and predicted values of a gbm object should be identical. But in this example with just one tree, the values are different:

b <- c(0,0,.8,0,0)
x <- mvrnorm(100,mu=rep(0,5),diag(5))
colnames(x) <- paste0("x",1:5)
y <- x %*% b + rnorm(10)

gbm.fit.out <- gbm.fit(y=y,x=x,shrinkage=.1,
    n.trees=1,distribution="gaussian",verbose=F)

d <- data.frame(y=y,x=x)
gbm.out <- gbm(y~.,data=d,shrinkage=.1,n.trees=1,distribution="gaussian",trainFrac=1)

p1 <- predict(gbm.fit,out,n.trees=1)
p2 <- predict(gbm.out,n.trees=1)
p1-p2

Why are they different? Does it even matter?

Best Answer

This seems to be peculiar to gbm.fit. Using gbm (and being sure to turn off bagging, and splitting the sample into training and test set) produces correct results.

require(MASS); require(gbm)
b <- c(0,0,.8,0,0)
x <- mvrnorm(100,mu=rep(0,5),diag(5))
colnames(x) <- paste0("x",1:5)
y <- x %*% b + rnorm(100)

out <-gbm(y~x1+x2+x3+x4+x5,data=data.frame(y,x),
 shrinkage=1,n.trees=1,
 distribution="gaussian",
 verbose=F,bag.fraction=1,train.fraction=1)

f <- out$fit
p <- predict(out,n.trees=1)
all(f-p == 0)