Solved – Replicate simulation study from a paper and calculate the MSE in R

elastic nethierarchical-bayesianmsersimulation

I have implemented a Gibbs Sampler for the Bayesian Elastic Net (BEN) according to this paper on Penalized Regression by Kyung et al.
In this paper, they execute a simulation study that has been used in other papers on Penalized Regression (LASSO, Bridge, Ridge) to compare the performance of the proposed models.

Here are details of the simulation taken from the above mentioned paper:

We simulate data from the true model
$$
y=X\beta+\sigma\epsilon \quad\epsilon_i\,{\raise.17ex\hbox{$\scriptstyle\sim$}}\,\text{iid}\,N(0,1)
$$
We simulate data sets with $n=20$ to fit models and $n=200$ to compare prediction errors of proposed models with eight predictors. We let $\beta=(3,1.5,0,0,2,0,0,0)$ and $\sigma=3$. The pairwise correlation between $x_i$ and $x_j$ was set to be $corr(i,j)=0.5^{|i-j|}$.
Later on they say, that for the prediction error, they calculate the average mean squared error based on 50 replications. By average they mean the median in this case.

To simulate this data and calculate the MSE I've used following code in R:

# Number of observations
n.train <- 20
n.test  <- 200
# Error variance
sigma <- 3
# Pairwise correlation of X
cor <- 0.5
# Number of predictors
p <- 8
# Create training and test data set (package QRM and mvtnorm required)
Z <- equicorr(p, rho=cor)
X.train <- rmvnorm(n.train,sigma=Z)
X.test  <- rmvnorm(n.test,sigma=Z)
# Create error 
error.train <- rnorm(n.train,mean=0,sd=1)
error.test  <- rnorm(n.test,mean=0,sd=1)
# Create beta
beta.true <- c(3,1.5,0,0,2,0,0,0)
# Create both responses
Y.train <- X.train %*% beta.true + sigma*error.train
Y.test  <- X.test %*% beta.true + sigma*error.test

# Fit the training data set with the BEN Gibbs Sampler
beta.ben <- BEN(X.train,Y.train, iter=11000, burn = 1000)
# Calculate the predicted response
Y.pred   <- X.test %*% beta.ben
# Calculate the mean squared error (MSE)
MSE      <- sum((Y.train - Y.pred)^2)/n.train

My problem is that my results are not even close to comparable to the ones in the paper which makes me doubt my simulation study "setup".
As one of the authors of the paper has uploaded the Gibbs Sampler code and I could check if I did something wrong, I know that the problem doesn't lie there.

So my questions are:

  1. Does anybody have experience with this kind of simulation study and can check if I did something wrong?
  2. Is the MSE I calculate the same as the one used in the paper? In researching on this topic I found many different ways to calculate the MSE and it was also sometimes used but actually the mean squared prediction error was meant. For example the Wikipedia article on MSE alone lists three variations.

I don't need help with coding, rather more information on how this simulation is typically excecuted so I can figure out what I'm doing wrong.

Best Answer

From what I understood, the correlation matrix is a Toeplitz symmetric matrix with $0.5$ on the 2nd diagonal, $0.5^2$ on the 3rd diagonal etc... But you built your correlation matrix with function equicorr, which creates a matrix with $1$ on the diagonal and $0.5$ anywhere else.

Wouldn't it be better to generate your correlation matrix this way?:

Z <- toeplitz(cor^(0:(p-1)))
Related Question