Solved – Predicted values from gbm.fit and gbm differ

boostingmachine learningpredictive-modelsr

My intuition is that the fitted values and predicted values of a gbm object should be identical. But in this example with just one tree, the values are different:

b <- c(0,0,.8,0,0)
x <- mvrnorm(100,mu=rep(0,5),diag(5))
colnames(x) <- paste0("x",1:5)
y <- x %*% b + rnorm(10)

gbm.fit.out <- gbm.fit(y=y,x=x,shrinkage=.1,
    n.trees=1,distribution="gaussian",verbose=F)

d <- data.frame(y=y,x=x)
gbm.out <- gbm(y~.,data=d,shrinkage=.1,n.trees=1,distribution="gaussian",trainFrac=1)

p1 <- predict(gbm.fit,out,n.trees=1)
p2 <- predict(gbm.out,n.trees=1)
p1-p2

Why are they different? Does it even matter?

Best Answer

This seems to be peculiar to gbm.fit. Using gbm (and being sure to turn off bagging, and splitting the sample into training and test set) produces correct results.

require(MASS); require(gbm)
b <- c(0,0,.8,0,0)
x <- mvrnorm(100,mu=rep(0,5),diag(5))
colnames(x) <- paste0("x",1:5)
y <- x %*% b + rnorm(100)

out <-gbm(y~x1+x2+x3+x4+x5,data=data.frame(y,x),
 shrinkage=1,n.trees=1,
 distribution="gaussian",
 verbose=F,bag.fraction=1,train.fraction=1)

f <- out$fit
p <- predict(out,n.trees=1)
all(f-p == 0)

Related Solutions

Solved – How does gentle boosting differ from AdaBoost

The second paper you cite seems to contain the answer to your question. To recap; mathematically, the main difference is in the shape of the loss function being used. Friedman, Hastie, and Tibshirani's loss function being easier to optimize at each iteration. –

Solved – Why does GBM predict different values for the same data

The factors, as always. Seems like the model is not using the actual value of the factor, but rather something like the position in the factor-levels.

I was able to reproduce your error with the data OrchardSprays

data(OrchardSprays)

model <- gbm(decrease ~ rowpos+colpos+treatment, data=OrchardSprays, n.trees=1000, distribution="gaussian", interaction.depth=3, bag.fraction=0.5, train.fraction=1.0, shrinkage=0.1, keep.data=TRUE)

firstrow <- OrchardSprays[1,]
str(firstrow)

manualFirstrow <- data.frame(decrease=57,rowpos=1,colpos=1,treatment="D")
str(manualFirstrow)

predict(model,newdata=firstrow,n.trees=100)
predict(model,newdata=manualFirstrow,n.trees=100)
predict(model,newdata=data.frame(decrease=57,rowpos=1,colpos=1,treatment="A"),n.trees=100)

output:

> predict(model,newdata=firstrow,n.trees=100)
[1] 50.31276
> predict(model,newdata=manualFirstrow,n.trees=100)
[1] 20.67818
> predict(model,newdata=data.frame(decrease=57,rowpos=1,colpos=1,treatment="A"),n.trees=100)
[1] 20.67818

since A has position 1 in the levels of OrchardSprays$treatment. Adding the levels to the data declaration does the trick

manualFirstrow <- data.frame(decrease=57,rowpos=1,colpos=1,treatment=factor("D",levels(OrchardSprays$treatment)))
str(manualFirstrow)

predict(model,newdata=firstrow,n.trees=100)
predict(model,newdata=manualFirstrow,n.trees=100)

output:

> predict(model,newdata=firstrow,n.trees=100)
[1] 50.31276
> predict(model,newdata=manualFirstrow,n.trees=100)
[1] 50.31276

Best Answer

Related Solutions

Solved – How does gentle boosting differ from AdaBoost

Solved – Why does GBM predict different values for the same data

Related Question