Solved – Bootstrapping confidence interval from a regression prediction

bootstrapconfidence intervalmachine learningregressionself-study

For homework, I was given data to create/train a predictor that uses lasso regression. I create the predictor and train it using the lasso python library from scikit learn.

So now I have this predictor that when given input can predict the output.

The second questions was to "Extend your predictor to report the confidence interval of the prediction by using the bootstrapping method."

I've looked around and found examples of people doing this for the mean and other things.

But I am completely lost on how I'm suppose to do it for a prediction. I am trying to use the scikit-bootstrap library.

The course staff is being extremely unresponsive, so any help is appreciated. Thank you.

Best Answer

Bootstrapping refers to resample your data with replacement. That is, instead of fitting your model to the original X and y, you fit your model to resampled versions of X and y for multiple times.

Thus, you get n slightly different models which you can use to create a confidence interval. Here is a visual example of such an interval.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Create toy data 
x = np.linspace(0, 10, 20)
y = x + (np.random.rand(len(x)) * 10)

# Extend x data to contain another row vector of 1s
X = np.vstack([x, np.ones(len(x))]).T

plt.figure(figsize=(12,8))
for i in range(0, 500):
    sample_index = np.random.choice(range(0, len(y)), len(y))

    X_samples = X[sample_index]
    y_samples = y[sample_index]    

    lr = LinearRegression()
    lr.fit(X_samples, y_samples)
    plt.plot(x, lr.predict(X), color='grey', alpha=0.2, zorder=1)

plt.scatter(x,y, marker='o', color='orange', zorder=4)

lr = LinearRegression()
lr.fit(X, y)
plt.plot(x, lr.predict(X), color='red', zorder=5)

enter image description here

Related Question