Solved – Root-Mean Squared Error for Bayesian Regression Models

bayesianpredictive-modelsregressionrms

I'm trying to get a sense of my prediction errors for a Bayesian regression model and I was using the Root-Mean-Squared Error. My question is, since are predictions are stochastic, would it make sense to take multiple draws of each sample point to account for variability in your parameter draws. That is, if we had 10 observed data points points, should one make 10 predictions for each data point, which would give us a total of 100 error values, average the sum of squares of those error values over 100 and take the square root? Thanks!

Best Answer

You can calculate whatever error metric you would like as long as you are using the posterior distribution to generate predictions. For example, if you have a matrix named "predictions" that consist of a sample of predicted values for each observation (columns=observations, rows=predicted values from posterior), then the "Bayesian RMSE" calculated in R from simulated data would look something like this:

n <- 50
m <- 100 
y <- sample(n)

# simulate errors 
errors <- matrix(rnorm(n*m,0,1), nrow=m, ncol=n)

# 100 predictions (rows) for each y (cols)
predictions <- t(y + t(errors))

# rmse function
rmse <- function(y,yhat){sqrt(mean((yhat-y)^2))}

# calculate posterior of rmse values
rmse_dist <- apply(predictions,1,rmse,y=y)

# summarize distribution
summary(rmse_dist)

Where the result is a vector of RMSE values equal to the number of predicted values sampled from the posterior predictions.

Related Question