Solved – Prediction interval for robust regression with MM-estimator

prediction intervalregressionrlmrobust

In their book "Robust Statistics", Maronna et al. consider the following model for robust regression: $y_i = \beta x_i + u_i$, where $u_i$ are independent of the $x_i$, and are i.i.d, with finite variance. They go on to provide a robust estimate $\hat{\beta}$ of $\beta$ which is asymptotically normal and give the covariance matrix for $\hat{\beta}$. My question is that without any knowledge about the distribution of $u_i$, is it possible to provide a prediction interval for $y$ (without using bootstrap)? I'm asking this because

library(MASS)
robustModel = rlm(formula = myFormula, data = myData, method = "MM")
predict.rlm(robustModel, newdata = myNewData, interval = "prediction")

in R generates a prediction interval. For reference, this is the code for predict.rlm:

predict.rlm <- function (object, newdata = NULL, scale = NULL, ...)
{
## problems with using predict.lm are the scale and
## the QR decomp which has been done on down-weighted values.
object$qr <- qr(sqrt(object$weights) * object$x)
    predict.lm(object, newdata = newdata, scale = object$s, ...)
}

It seems to me that the prediction interval that is obtained this way is for normally distributed $u_i$. Is that correct? What am I missing here?

Best Answer

It would be easier to answer the question if we had the actual formula for the estimator. But generally speaking the exact distribution of the estimator should depend on the error distribution. However the covariance matrix can be estimated from the data. An exact prediction interval would seem to depend on the distribution of the error term and hence cannot be determined without specifying the error distribution. But that doesn't mean that you could not get an approximate prediction interval (note that the problem is the same for confidence intervals). For the example where beta is one-dimensional the covariance matrix is just a single variance. In the case where beta is multidimensional then there would be covariance terms. I think the result in the book deals with a robust estimator in the more general multidimensional case.

Going back to the one-dimensional case the estimates standard deviation could be estimated from the data and the asymptotic normal distribution could be used to get approximate confidence and prediction intervals which the theory says would have the approximate confidence level for large n.

Related Solutions

Solved – Can we make probabilistic statements with prediction intervals

First, on the use of the word probability, frequentists don't have a problem with using the word probability when predicting something where the random piece has not taken place yet. We don't like the word probability for a confidence interval because the true parameter is not changing (we are assuming it is a fixed, though unknown, value) and the interval is fixed because it is based on data that we have already collected. For example if our data comes from a random sample of adult male humans and x is their height and y is their weight and we fit the general regression model then we don't use probability when talking about the confidence intervals. But if I want to talk about what is the probability of a 65 inch tall male chosen at random from all 65 inch tall males having a weight within a certain interval, then it is fine to use probability in that context (because the random selection has not yet been made, so probability makes sense).

So I would say that the answer to the bonus question is "Yes". If we knew enough information, then we could compute the probability of seeing a y value within an interval (or find an interval with the desired probability).

For your statement labeled "1." I would say that it is OK if you use a word like "approximate" when talking about the interval or probability. Like you mention in the bonus question, we can decompose the uncertainty into a piece about the center of the prediction and a piece about the randomness around the true mean. When we combine these to cover all our uncertainty (and assuming we have the model/normality correct) we have an interval that will tend to be too wide (though can be too narrow as well), so the probability of a new randomly chosen point falling into the prediction interval is not going to be exactly 95%. You can see this by simulation. Start with a known regression model with all the parameters known. Choose a sample (across many x values) from this relationship, fit a regression, and compute the prediction interval(s). Now generate a large number of new data points from the true model again and compare them to the prediction intervals. I did this a few times using the following R code:

x <- 1:25
y <- 5 + 3*x + rnorm(25, 0, 5)
plot(x,y)

fit <- lm(y~x)
tmp <- predict(fit, data.frame(x=1:25), interval='prediction')

sapply( 1:25, function(x){ 
    y <- rnorm(10000, 5+3*x, 5)
    mean( tmp[x,2] <= y & y <= tmp[x,3] )
})

I ran the above code a few times (around 10, but I did not keep careful count) and most of the time the proportion of new values falling in the intervals ranged in the 96% to 98% range. I did have one case where the estimated standard deviation was very low that the proportions were in the 93% to 94% range, but all the rest were above 95%. So I would be happy with your statement 1 with the change to "approximately 95%" (assuming all the assumptions are true, or close enough to be covered in the approximately).

Similarly, statement 2 needs an "approximately" or similar, because to cover our uncertainty we are capturing on average more than the 95%.

Generalized Linear Model – How to Derive Prediction Intervals for New x_i in Gamma GLM

The prediction interval for a new observation depends on both the assumed inherent randomness in this case given by the gamma distribution, and the uncertainty coming from the parameters that are estimated and not assumed to be known.

In general there is no analytical or closed form expression for the combination of the two effects. The two main options are to ignore parameter uncertainty and to use simulation methods.

Ignoring parameter uncertainty: If we ignore that the parameters are estimated with some sampling uncertainty, the the distribution of a new observation is just given by the assumed gamma distribution. We can use the mean and scale estimates to compute the relevant prediction intervals using e.g. the distribution methods in scipy.stats. The parameterization might have to be transformed from the GLM parameterization to the scipy.stats parameterization.

Simulation Methods: One possibility is to use full bootstrap on the original sample to simulate new observations. The simpler method is to assume that the asymptotic normal distribution for the parameter estimates are appropriate and simulate the parameters of the gamma distribution from the mean and covariance of the parameter estimates. For each sampled parameter we can sample a new observation and compute a confidence interval from the simulated observations. One problem with this is that GLM only estimates the mean parameters directly, the scale is estimated using deviance or pearson chi2. That is, GLM in statsmodels in other packages does not provide a joint covariance for mean and scale parameter.

Because of these problems, statsmodels currently provides prediction intervals for new observations that take parameter uncertainty into account only for the linear normal case, i.e. OLS.

Best Answer

Related Solutions

Solved – Can we make probabilistic statements with prediction intervals

Generalized Linear Model – How to Derive Prediction Intervals for New x_i in Gamma GLM

Related Question