Apologies if this is the wrong place for questions that overlap with programming. I'm trying to implement my own Bayesian regression class, but I'm finding my credible intervals are coming in much larger than expected. Below is a copy of my code and a picture of the output it currently generates (I've set a seed to avoid any randomness). Is there an issue in my implementation, or are my expectations just out of whack?
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# =============================================================================
# Bayesian Regression Class
# =============================================================================
class BayesianLinearRegression:
def __init__(self, n_features, alpha, beta):
self.alpha = alpha
self.beta = beta
self.mean = np.zeros(n_features)
self.cov_inv = np.eye(n_features) / alpha
def fit(self, X, y):
self.n_features = X.shape[1]
# Update the inverse covariance matrix (Bishop eq. 3.51)
self.cov_inv = self.alpha * np.eye(self.n_features) + self.beta * X.T @ X
# Update the mean vector (Bishop eq. 3.50)
self.mean = self.beta * np.linalg.inv(self.cov_inv) @ (X.T @ y)
return self
def predict(self, X):
# Obtain the predictive mean (Bishop eq. 3.58)
y_mean = X @ self.mean
# Obtain the predictive variance (Bishop eq. 3.59)
y_var = (1 / self.beta + np.sum(X @ np.linalg.inv(self.cov_inv) * X, axis=1)).reshape(-1,1)
return stats.norm(loc=y_mean, scale=np.sqrt(y_var))
# =============================================================================
# Create data
# =============================================================================
np.random.seed(42)
X = np.random.rand(500, 1)
y = []
for x in X:
y.append(2.5 + x + 0.2 * np.random.randint(-1, 1) * np.random.rand())
y = np.array(y)
X_train = np.hstack((np.ones((len(X), 1)), X))
X_test = np.linspace(0, 1, 1000)
X_test = np.hstack((np.ones((len(X_test), 1)), X_test.reshape(-1,1)))
# =============================================================================
# Train model
# =============================================================================
model = BayesianLinearRegression(n_features=X_train.shape[1], alpha=0.3, beta=1)
model.fit(X_train, y)
y_predm = model.predict(X_test).mean()
stat_type = "std"
if stat_type == "std":
y_preds = model.predict(X_test).std()
y_pred = np.hstack((y_predm - y_preds, y_predm, y_predm + y_preds))
if stat_type == "int":
y_preds = model.predict(X_test).interval(0.95)
y_pred = np.hstack((y_preds[0], y_predm, y_preds[1]))
# =============================================================================
# Plot outputs
# =============================================================================
plt.figure(figsize=(20,20))
plt.scatter(X, y)
plt.plot(X_test[:,1], y_pred[:,0])
plt.plot(X_test[:,1], y_pred[:,1])
plt.plot(X_test[:,1], y_pred[:,2])
plt.legend()
plt.show()
Best Answer
The bug is not in your implementation of Bayesian linear regression but in how you sample the errors in Y.
Aside: You don't cite Pattern Recognition and Machine Learning by Bishop properly.
In Section 3.3 on Bayesian linear regression, Bishop assumes that the noise variance is known and that X has a (multivariate) Normal distribution. You violate both of these assumptions by sampling error terms from a mixture of uniform(-0.2, 0) and a spike at 0.
This is your error distribution; it's idiosyncratic to say the least.
You can notice the issue in your original plot as an unexpected upper bound on the spread of the observations.
And here is how to sample the errors properly.
The rest of your code works fine, so I only attach an updated plot of the posterior.