Solved – Binary Genetic Algorithm in R, with strong cardinality constraints

optimizationr

I am trying to use package GA in R for a problem where I must find the best combination of 4 out of 56 instruments. Obviously the number of combinations of 4 instruments out of 56 runs into the hundreds of thousands, therefore a brute force approach is out of the question, especially that the target function itself contains another optimization and is slow. So the Genetic Algorithm seems suitable. However the ga function, even with "type = binary", does not seem to be able to have a cardinality constraint. Instead the documentation (page 21) suggests using AIC or BIC information criteria by putting in a penalty in the objective function but I already know that I only want 4. Here is my current code:

gaopter <- function(xx) {
    a4 <- ay[, xx == 1]
    a4r <- ayr[, xx == 1]

    opter <- function(x) {
    # optimize for sd over vol, where a4 is a subcolumn of ay
        rets <- a4r %*% x
        outs <- a4 %*% x
        sdouts <- sd(outs)
        tomake <- (abs(last(outs) - mean(outs)) / sdouts - 1) * sdouts
        -(tomake / sd(rets))
        #(last(outs) - mean(outs)) / sd(outs)
    }
    optim(rep(1, ncol(a4)), opter)$value
}

ga(type = "binary", fitness = gaopter, nBits = ncol(ay), maxiter = 1000, run = 20000)

Variables ay and ayr are matrices of 56 vectors and their daily returns, respectively, so what I am trying to optimize for, above is: the best linear combination of 4 of the 56 vectors, such that their daily returns standard deviation is minimized, while at the same time, the last point in the series is as far as possible from the mean. The above code will just find the best combination of the 56 vectors that satisfy the above, without any cardinality constraint.

The major problem with a low cardinality constraint such as 4 is that with a random binary vector of 0s and 1s 56 in length, it is almost never going to give me only 4 ones with the rest zeros in a random combination, and so putting in some kind of cardinality == 4 penalty will just mean nothing ever gets run. What can I do to limit cardinality of the combinations to 4, using package GA or another?

Best Answer

I will try and solve it using the Cross entropy algorithm. It is a method suggest by Reuven Rubenstein CE Method.

Basically the idea is to keep good solutions and deduce on the parameters based on them. So in general suppose you want to find the 4 lowest SD. the algorithm will be as following: 1. Create a vector of probabilities for each instrument. 2. Simulate multiple solutions. 3. Take the best $\xi$ solutions. 4. Update the probability vector.

Small Simulation.

> ## Creating Data
> 
> sig.vec      <- rep(1, 200) 
> low.sig.ind  <- sample(200, 4)
> sig.vec[low.sig.ind] <- 0.4 
> 
> ret <- t(replicate(300, rnorm(200, rep(0, 200), sig.vec)))
> 
> 
> 
> low.sig.ind
[1] 197 152  49  51
> ### Prepare running CE 
> iter     <- 100
> prob.vec <- runif(200) 
> sol.mat  <- matrix(NA, nrow = iter, ncol = 5)
> prob.list <- list()
> for (i in 1:10) {
+   ## Creating Sol 
+   for (j in 1:iter) {
+     temp.choice    <- sample(4, x = 200, prob = prob.vec)
+     sol.mat[j, ]   <- c(temp.choice, sd(apply(ret[,temp.choice], 1, sum)))
+   }
+   temp.tab <- table(factor(sol.mat[sol.mat[,5] <= quantile(sol.mat[,5], 0.1), 1:4], 
+                            levels = 1:200)) 
+   prob.vec <- temp.tab / sum(temp.tab)
+   prob.list[[i]] <- prob.vec
+ }
> 
> prob.vec[prob.vec == 0.25]

  10   49   72  152 
0.25 0.25 0.25 0.25 

So it found the correct ones, off course this is a toy example.

Related Question