Solved – Scoring quantile regressor

loss-functionsquantile regressionscoring-rules

Let's suppose that there is a real random variable $Y$ that is generated by some random process that depends somehow on vector $\vec x.$

I've built a model that for given $\vec x$ predicts $\tau$-quantile $Q_{\tau}$ of posterior distribution $P(y \mid \vec x)$ and I'd like to score my model's predictions using available set of $n$ data points $(\vec x_i, y_i)$ where each $y_i$ is one sample drawn from (unknown) distribution $P(y\mid \vec x_i).$

As stated in Roger Koenker's paper $\tau$-quantile of random variale $Y$ is a result of minimization of (posterior) expectation of loss function $$\rho_{\tau}(y, \hat y) = (y – \hat y)(\tau – I[y < \hat y])$$ with respect to $\hat y$.

So I guess that it's possible to define loss function for scoring my model's estimations as $$l = \frac 1 n \sum\limits_{i=1}^{n} \rho_\tau(y_i, \hat Q_{\tau,i})$$ where $\hat Q_{\tau,i}$ is a predicted by my model $\tau$-quantile for $\vec x_i.$ This metric allows to compare different models estimations between each other, but I can't see a way how to use this metric to measure how much the model's estimations worse than the best possible.

Is there any way to say how good my model is using this or any other metric and the given data?

Best Answer

The first thing we expect from a quantile forecast is that it respects the prespecified quantile, i.e., that it provides quantile predictions that are larger than a proportion $\tau$ of your realizations. You can check this by looking at your $n$ quantile predictions $\hat{y}_i$ and assessing whether

$$\hat{\tau} := \frac{1}{n} \#\{i\colon y_i<\hat{y}_i\} \approx \tau.$$

As a matter of fact, you can of course even do inferential statistics. Under the null hypothesis that $\hat{\tau}=\tau$, whether or not a given future realization fulfills $y_i<\hat{y}_i$ is Bernoulli distributed with probability $\tau$. Given $n$ future realizations, the total number $\#\{i\colon y_i<\hat{y}_i\}$ of successes will, under the null hypothesis, be $(n,\tau)$-binomially distributed, so you can calculate confidence intervals into which you expect (say) 95% of numbers of successes to fall. If your actual number of successes is outside this interval, you can reject the null hypothesis that $\hat{\tau}=\tau$ at $\alpha=0.05$. Similarly, two different quantile forecasting algorithms may yield different $\hat{\tau}_1$ and $\hat{\tau}_2$, and you can similarly assess whether these are significantly different.


However, this is certainly not the end of the story. A little more thinking gives us a somewhat more stringent criterion for a model to be good: the best possible model for $\tau$-quantile predictions will provide quantile predictions that are larger than a proportion $\tau$ of your realizations - and do this independently of the predictors $x$.

To see why the last clause is important, assume that $x$ is a one-dimensional predictor which takes the values $x=(1,0,1,0,1,0,\dots)$. Assume further that the true future distribution is a mixture of two uniforms and depends on $x$ as follows:

$$ y \sim \begin{cases} U[0,1], & x=0 \\ U[1,2], & x=1 \end{cases} $$

Now, the unconditional distribution of $y$ is $U[0,2]$, since half of our $x$ are 0 and half are 1. Thus, if we want a median forecast ($\tau=0.5$), we could simply forecast $\hat{y}=1$. Then half of our observations would be below this prediction (namely, the ones where $x=0$), and half would be above it (those with $x=1$). We thus would have a wonderful median forecast that covers exactly the prespecified proportion of realizations.

Nevertheless, we would certainly not say that this median forecast is good, since its performance still depends heavily on $x$. The best median forecast would of course take the dependence on $x$ into account:

$$ \hat{y} := \begin{cases} 0.5, & x=0 \\ 1.5, & x=1 \end{cases} $$

Thus, another test you should do is to take the indicator variable of successes $I_{\{y_i<\hat{y}_i\}}$ and check whether this is independent of $x_i$. You can do a logistic regression of $I_{\{y_i<\hat{y}_i\}}$ against $x_i$ and check the significance of this model, or you could do any kind of machine learning algorithm, like feeding $I_{\{y_i<\hat{y}_i\}}$ and $x_i$ into a Random Forest - any predictive power of $x_i$ against $I_{\{y_i<\hat{y}_i\}}$ that you find is evidence that your quantile prediction is not yet optimal.


This question is actually rather important in time series analysis and forecasting. Here, the question is one of forecasting Value at Risk, where you don't want a simple approach that gives correct quantiles on average, but overshoots the quantile during calm periods in the market, but undershoots it during turbulent times. Or there may be periodicities in variances, which a good quantile forecast had better incorporate. (See my example above.)

Thus, what we are most interested in in the context of time series analysis is not so much whether $I_{\{y_i<\hat{y}_i\}}$ depends on some predictor $x_i$, but rather more in whether there is any autoregressive behavior in the time series $I_{\{y_t<\hat{y}_t\}}$. Tests have been developed to check for such autoregressive dynamics. Probably the first paper on this was Christoffersen (1998, International Economic Review), or later Clements & Taylor (2003, Journal of Applied Econometrics), and recently Dumitrescu, Hurlin & Madkour (2013, Journal of Forecasting). If your underlying data have time series characteristics, I'd very much recommend that you look into this literature.


Finally, for a somewhat different take on this question, I recommend Gneiting (2011, International Journal of Forecasting), who investigates proper scoring rules for quantile forecasts as point forecasts. He shows that all such proper scoring rules are actually slight generalizations of Koenker's $\rho_\tau$ function you note. This might be interesting for you.

Related Question