I thought I might toy with some Bayesian variable selection, following a nice blog post and the linked papers therein. I wrote a program in rjags (where I am quite a rookie) and fetched price data for Exxon Mobil, along with some things that are unlikely to explain its returns (e.g. palladium prices) and other things that should be highly correlated (like the SP500).
Running lm()
, we see that there strong evidence of an overparameterized model, but that palladium should definitely be excluded:
Call:
lm(formula = Exxon ~ 0 + SP + Palladium + Russell + OilETF +
EnergyStks, data = chkr)
Residuals:
Min 1Q Median 3Q Max
-1.663e-03 -4.419e-04 3.099e-05 3.991e-04 1.677e-03
Coefficients:
Estimate Std. Error t value Pr(>|t|)
SP 0.51913 0.19772 2.626 0.010588 *
Palladium 0.01620 0.03744 0.433 0.666469
Russell -0.34577 0.09946 -3.476 0.000871 ***
OilETF -0.17327 0.08285 -2.091 0.040082 *
EnergyStks 0.79219 0.11418 6.938 1.53e-09 ***
After converting to returns, I tried running a simple model like this
model {
for (i in 1:n) {
mean[i]<-inprod(X[i,],beta)
y[i]~dnorm(mean[i],tau)
}
for (j in 1:p) {
indicator[j]~dbern(probindicator)
betaifincluded[j]~dnorm(0,taubeta)
beta[j] <- indicator[j]*betaifincluded[j]
}
tau~dgamma(1,0.01)
taubeta~dgamma(1,0.01)
probindicator~dbeta(2,8)
}
but I found that, pretty much regardless of the parameters to the chosen gamma distributions, I got pretty nonsensical answers, such as an unvarying 20% inclusion probability for each variable.
I also got tiny, tiny regression coefficients, which I am willing to tolerate since this supposed to be a selection model, but that still seemed weird.
Mean SD Naive SE Time-series SE
SP beta[1] -4.484e-03 0.10999 0.003478 0.007273
Palladium beta[2] 1.422e-02 0.16646 0.005264 0.011106
Russell beta[3] -2.406e-03 0.08440 0.002669 0.003236
OilETF beta[4] -4.539e-03 0.14706 0.004651 0.005430
EnergyStks beta[5] -1.106e-03 0.07907 0.002500 0.002647
SP indicator[1] 1.980e-01 0.39869 0.012608 0.014786
Palladium indicator[2] 1.960e-01 0.39717 0.012560 0.014550
Russell indicator[3] 1.830e-01 0.38686 0.012234 0.013398
OilETF indicator[4] 1.930e-01 0.39485 0.012486 0.013229
EnergyStks indicator[5] 2.070e-01 0.40536 0.012819 0.014505
probindicator 1.952e-01 0.11981 0.003789 0.005625
tau 3.845e+03 632.18562 19.991465 19.991465
taubeta 1.119e+02 107.34143 3.394434 7.926577
Is Bayesian variable selection really that bad/sensitive? Or am I making some glaring error?
Best Answer
In the BUGS code,
mean[i]<-inprod(X[i,],beta)
should bemean[i]<-inprod(X[i,],beta[])
.Your priors on tau and taubeta are too informative.
You need a non-informative prior on betaifincluded, use e.g. a
gamma(0.1,0.1)
on taubeta. This may explain why you get tiny regression coefficients.