Solved – Bayesian variable selection — does it really work

bayesianfeature selectionjagsmultiple regressionregression

I thought I might toy with some Bayesian variable selection, following a nice blog post and the linked papers therein. I wrote a program in rjags (where I am quite a rookie) and fetched price data for Exxon Mobil, along with some things that are unlikely to explain its returns (e.g. palladium prices) and other things that should be highly correlated (like the SP500).

Running lm(), we see that there strong evidence of an overparameterized model, but that palladium should definitely be excluded:

Call:
lm(formula = Exxon ~ 0 + SP + Palladium + Russell + OilETF + 
    EnergyStks, data = chkr)

Residuals:
       Min         1Q     Median         3Q        Max 
-1.663e-03 -4.419e-04  3.099e-05  3.991e-04  1.677e-03 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
SP          0.51913    0.19772   2.626 0.010588 *  
Palladium   0.01620    0.03744   0.433 0.666469    
Russell    -0.34577    0.09946  -3.476 0.000871 ***
OilETF     -0.17327    0.08285  -2.091 0.040082 *  
EnergyStks  0.79219    0.11418   6.938 1.53e-09 ***

After converting to returns, I tried running a simple model like this

  model {
    for (i in 1:n) {
      mean[i]<-inprod(X[i,],beta)
      y[i]~dnorm(mean[i],tau)
    }
    for (j in 1:p) {
      indicator[j]~dbern(probindicator)
      betaifincluded[j]~dnorm(0,taubeta)
      beta[j] <- indicator[j]*betaifincluded[j]
    }
    tau~dgamma(1,0.01)
    taubeta~dgamma(1,0.01)
    probindicator~dbeta(2,8)
  }

but I found that, pretty much regardless of the parameters to the chosen gamma distributions, I got pretty nonsensical answers, such as an unvarying 20% inclusion probability for each variable.

I also got tiny, tiny regression coefficients, which I am willing to tolerate since this supposed to be a selection model, but that still seemed weird.

                              Mean        SD  Naive SE Time-series SE
SP         beta[1]       -4.484e-03   0.10999  0.003478       0.007273
Palladium  beta[2]        1.422e-02   0.16646  0.005264       0.011106
Russell    beta[3]       -2.406e-03   0.08440  0.002669       0.003236
OilETF     beta[4]       -4.539e-03   0.14706  0.004651       0.005430
EnergyStks beta[5]       -1.106e-03   0.07907  0.002500       0.002647
SP         indicator[1]   1.980e-01   0.39869  0.012608       0.014786
Palladium  indicator[2]   1.960e-01   0.39717  0.012560       0.014550
Russell    indicator[3]   1.830e-01   0.38686  0.012234       0.013398
OilETF     indicator[4]   1.930e-01   0.39485  0.012486       0.013229
EnergyStks indicator[5]   2.070e-01   0.40536  0.012819       0.014505
           probindicator  1.952e-01   0.11981  0.003789       0.005625
           tau            3.845e+03 632.18562 19.991465      19.991465
           taubeta        1.119e+02 107.34143  3.394434       7.926577

Is Bayesian variable selection really that bad/sensitive? Or am I making some glaring error?

Best Answer

In the BUGS code, mean[i]<-inprod(X[i,],beta) should be mean[i]<-inprod(X[i,],beta[]).

Your priors on tau and taubeta are too informative.

You need a non-informative prior on betaifincluded, use e.g. a gamma(0.1,0.1) on taubeta. This may explain why you get tiny regression coefficients.

Related Question