Solved – Model performance in quantile modelling

I am using quantile regression (for example via gbm or quantreg in R) – not focusing on the median but instead an upper quantile (e.g. 75th). Coming from a predictive modeling background, I want to measure how well the model fits on a test set and be able to describe this to a business user. My question is how? In a typical setting with a continuous target I could do the following:

Calculate the overall RMSE
Decile the data set by the predicted value and compare the average
actual to the average predicted in each decile.
Etc.

What can be done in this case, where there really is no actual value (i don't think at least) to compare the prediction to?

Here is an example code:

install.packages("quantreg")
library(quantreg)

install.packages("gbm")
library(gbm)

data("barro")

trainIndx<-sample(1:nrow(barro),size=round(nrow(barro)*0.7),replace=FALSE)
train<-barro[trainIndx,]
valid<-barro[-trainIndx,]

modGBM<-gbm(y.net~., # formula
            data=train, # dataset
            distribution=list(name="quantile",alpha=0.75), # see the help for other choices
            n.trees=5000, # number of trees
            shrinkage=0.005, # shrinkage or learning rate,
            # 0.001 to 0.1 usually work
            interaction.depth=5, # 1: additive model, 2: two-way interactions, etc.
            bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best
            train.fraction = 0.5, # fraction of data for training,
            # first train.fraction*N used for training
            n.minobsinnode = 10, # minimum total weight needed in each node
            cv.folds = 5, # do 3-fold cross-validation
            keep.data=TRUE, # keep a copy of the dataset with the object
            verbose=TRUE) # don’t print out progress

best.iter<-gbm.perf(modGBM,method="cv")

pred<-predict(modGBM,valid,best.iter)

Now what – since we don't observe the percentile of the conditional distribution?

Add:

I hypothesized several methods and I would like to know if they are correct and if there are better ones – also how to interpret the first:

Calculate the average value from the loss functions:
```
qregLoss<-function(actual, estimate,quantile)
{
   (sum((actual-estimate)*(quantile-((actual-estimate)<0))))/length(actual)

}
```
This is the loss function for quantile regression – but how do we interpret the value?
Should we expect that if for example we are calculating the 75th percentile that on a test set, the predicted value should be greater than the actual value around 75% of the time?

Are there other methods formal or heuristic to describe how well the model predicts new cases?

Solved – Model performance in quantile modelling

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Quantile regression prediction

Solved – Why does GBM predict different values for the same data

Related Question