Bayesian Regression – Handling Extremely Large Credible Intervals and Standard Deviation in Python

bayesianpythonregression

Apologies if this is the wrong place for questions that overlap with programming. I'm trying to implement my own Bayesian regression class, but I'm finding my credible intervals are coming in much larger than expected. Below is a copy of my code and a picture of the output it currently generates (I've set a seed to avoid any randomness). Is there an issue in my implementation, or are my expectations just out of whack?

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# =============================================================================
# Bayesian Regression Class
# =============================================================================
class BayesianLinearRegression:

    def __init__(self, n_features, alpha, beta):
        self.alpha = alpha
        self.beta = beta
        self.mean = np.zeros(n_features)
        self.cov_inv = np.eye(n_features) / alpha

    def fit(self, X, y):
        
        self.n_features = X.shape[1]
        
        # Update the inverse covariance matrix (Bishop eq. 3.51)
        self.cov_inv = self.alpha * np.eye(self.n_features) + self.beta * X.T @ X

        # Update the mean vector (Bishop eq. 3.50)
        self.mean = self.beta * np.linalg.inv(self.cov_inv) @ (X.T @ y)

        return self

    def predict(self, X):

        # Obtain the predictive mean (Bishop eq. 3.58)
        y_mean = X @ self.mean

        # Obtain the predictive variance (Bishop eq. 3.59)
        y_var = (1 / self.beta + np.sum(X @ np.linalg.inv(self.cov_inv) * X, axis=1)).reshape(-1,1)

        return stats.norm(loc=y_mean, scale=np.sqrt(y_var))

# =============================================================================
# Create data
# =============================================================================
np.random.seed(42)

X = np.random.rand(500, 1)
y = []
for x in X:
    y.append(2.5 + x + 0.2 * np.random.randint(-1, 1) * np.random.rand())
y = np.array(y)

X_train = np.hstack((np.ones((len(X), 1)), X))

X_test = np.linspace(0, 1, 1000)
X_test = np.hstack((np.ones((len(X_test), 1)), X_test.reshape(-1,1)))

# =============================================================================
# Train model
# =============================================================================
model = BayesianLinearRegression(n_features=X_train.shape[1], alpha=0.3, beta=1)
model.fit(X_train, y)
y_predm = model.predict(X_test).mean()

stat_type = "std"
if stat_type == "std":
    y_preds = model.predict(X_test).std()
    y_pred = np.hstack((y_predm - y_preds, y_predm, y_predm + y_preds))
if stat_type == "int":
    y_preds = model.predict(X_test).interval(0.95)
    y_pred = np.hstack((y_preds[0], y_predm, y_preds[1]))

# =============================================================================
# Plot outputs
# =============================================================================    
plt.figure(figsize=(20,20))
plt.scatter(X, y)
plt.plot(X_test[:,1], y_pred[:,0])
plt.plot(X_test[:,1], y_pred[:,1])
plt.plot(X_test[:,1], y_pred[:,2])
plt.legend()
plt.show()

Regression output with standard deviation

Best Answer

The bug is not in your implementation of Bayesian linear regression but in how you sample the errors in Y.


Aside: You don't cite Pattern Recognition and Machine Learning by Bishop properly.


In Section 3.3 on Bayesian linear regression, Bishop assumes that the noise variance is known and that X has a (multivariate) Normal distribution. You violate both of these assumptions by sampling error terms from a mixture of uniform(-0.2, 0) and a spike at 0.

This is your error distribution; it's idiosyncratic to say the least.

errors = []
for x in X:
    errors.append(0.2 * np.random.randint(-1, 1) * np.random.rand())

enter image description here

You can notice the issue in your original plot as an unexpected upper bound on the spread of the observations.

And here is how to sample the errors properly.

alpha0 = 2.5
alpha1 = 0.8
beta = 1
n = 1000

# The predictor has uniform(0, 1) distribution.
# Predictors can have any distribution actually.
X = np.random.rand(n, 1)

# The errors have normal distribution.
error = np.random.randn(n, 1) / np.sqrt(beta)
y = alpha0 + alpha1 * X + error  # Equation (3.10) in Bishop

The rest of your code works fine, so I only attach an updated plot of the posterior.

enter image description here

Related Question