Solved – Confidence Versus Prediction Intervals using Quantile Regression / Quantile Loss Function

confidence intervalmachine learningquantile regression

If you fit a quantile regression for the 5th and 95th percentile this is often described as an estimate of a 90% prediction interval. This is the most prevalent it seems in the machine learning domain where random forests has been adapted to predict the quantiles of each leaf node or GBM with a quantile loss function.

Is this best characterized as a confidence or prediction interval and why?

Best Answer

Definitely a prediction interval, see for example here.

Quantile regression for the $5^\textrm{th}$ and $95^\textrm{th}$ quantiles attempts to find bounds $y_0({\bf x})$ and $y_1({\bf x})$, on the response variable $y$ given predictor variables ${\bf x}$, such that $$ \mathbb{P}\left(Y\le y_0({\bf X})\right)=0.05 \\ \mathbb{P}\left(Y\le y_1({\bf X})\right)=0.95 $$ so $$ \mathbb{P}\left(\,y_0({\bf X})\le Y\le y_1({\bf X})\,\right)\ =\ 0.90 $$ which is by definition a $90\%$ prediction interval.

A $90\%$ prediction interval should contain (as-yet-unseen) new data $90\%$ of the time. In contrast, a $90\%$ confidence interval for some parameter (e.g. the mean) should contain the true mean unless we were unlucky to the tune of 1-in-10 in the data used to construct the interval.

Related Question