MATLAB: What is the way to get an optimum or near optimum combination of different columns (contains 1 and 0 only) to be equal to particular column (In the case, it’s all 1 column)

columncombinationknapsacklinear programmingoptimization

Input = [1 0 0 0; 0 1 1 0; 0 0 0 1; 1 0 0 0; 0 1 0 1; 0 0 1 0; 0 0 1 0; 1 0 0 0; 0 1 0 1] I want the columns which, when combined, should go as close as they can get to a column of ones. Here, if we combine the first, third and fourth column I get a column of ones. What is the algorithm to do this for bigger matrices (120 x 70)?

Best Answer

If you knew an exact solution existed, then intlinprog would be perfect.

By the way, it is a bad idea to use the name Input for a variable name, even if you capitalize the name. It will cause bugs in your code one day, when you mistake and use a lower case i there. So I'll call it something else.

Aeq = [1 0 0 0; 0 1 1 0; 0 0 0 1; 1 0 0 0; 0 1 0 1; 0 0 1 0; 0 0 1 0; 1 0 0 0; 0 1 0 1];
beq = ones(9,1);

So we want to find a vector X, such that

Aeq*X = beq

The requirement is that X must be a vector of integers, either 0 or 1.

A problem may be that there are multiple solutions. So we will use an objective that says that the vector X must have as small a sum as possible. That is what f will do here:

f = ones(4,1);
X = intlinprog(f,1:4,[],[],Aeq,beq,zeros(4,1),ones(4,1))
LP:                Optimal objective value is 3.000000.                                             
Optimal solution found.
Intlinprog stopped at the root node because the objective value is within a gap tolerance of the optimal value, options.AbsoluteGapTolerance = 0 (the default value). The
intcon variables are integer within tolerance, options.IntegerTolerance = 1e-05 (the default value).
X =
     1
     0
     1
     1
find(X)
ans =
     1
     3
     4

So no problem at all, as long as a solution does exist.

But what if there is no exact solution? Then intlinprog will fail, because there will be no feasible solution found. In fact, this is the case I would usually expect when Aeq has more rows than columns. So the above solution will generally be inadequate. Instead, we want to solve a different problem. That is, solve for the vector X such that

sum(abs(Aeq*X - beq))

is minimized, where X is constrained to be a binary vector as above. We might think we want to solve this as a least squares problem, so minimizing this 2-norm:

norm(Aeq*X - beq)

The problem is, INTLINPROG does not solve this problem, and LSQLIN does not handle integer constraints. So, suppose we had this array?

Aeq = double(rand(120,70) < 0.05);
beq = ones(120,1);

As you can see, lsqlin fails.

Xr = lsqlin(Aeq,beq,[],[],[],[],zeros(70,1),ones(70,1));
Minimum found that satisfies the constraints.
Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the default value of the optimality tolerance,
and constraints are satisfied to within the default value of the constraint tolerance.
<stopping criteria details>
>> Xr
Xr =
     0.043631
       0.3165
       0.4857
       0.2582
      0.47377
      0.16141
      0.22724
      0.50588
      0.72435
...

LSQLIN does not understand that the solution vector must be binary, even if I constrain it to lie in [0,1].

So the trick is to solve the minimum sum of absolute values problem using INTLINPROG. We can do that using slack variables. Essentially, we create a whole slew of extra variables, two new variables for each row of Aeq. (I think I explained this in my optimization tips and tricks doc, found on the File Exchange.)

[neq,nvar] = size(Aeq);
f = [zeros(nvar,1);ones(neq*2,1)];
LB = [zeros(nvar,1);zeros(2*neq,1)];
UB = [ones(nvar,1);inf(2*neq,1)];
Anew = [Aeq,eye(neq),-eye(neq)];
intcon = 1:nvar;

Ok, so now how does intlinprog do?

Xb = intlinprog(f,intcon,[],[],Anew,beq,LB,UB);
LP:                Optimal objective value is 24.757627.                                            
Cut Generation:    Applied 6 Gomory cuts.                                                           
                   Lower bound is 26.302995.                                                        
Heuristics:        Found 4 solutions using total rounding.                                          
                   Upper bound is 56.000000.                                                        
                   Relative gap is 52.10%.                                                         

Heuristics:        Found 2 solutions using rss.                                                     
                   Upper bound is 36.000000.                                                        
                   Relative gap is 26.20%.                                                         
Branch and Bound:
     nodes     total   num int        integer       relative                                          
  explored  time (s)  solution           fval        gap (%)                                         
        27      0.13         7   3.300000e+01   1.146025e+01                                          
        74      0.17         8   3.200000e+01   3.281596e+00                                          
       135      0.20         8   3.200000e+01   0.000000e+00                                          
Optimal solution found.
Intlinprog stopped because the objective value is within a gap tolerance of the optimal value, options.AbsoluteGapTolerance = 0 (the default value). The intcon variables
are integer within tolerance, options.IntegerTolerance = 1e-05 (the default value).

Well, it thinks it converged. The solution is just the first nvar elements of Xb. The slack variables are ignored now.

find(Xb(1:nvar))
ans =
     6
     8
     9
    10
    12
    13
    22
    26
    27
    34
    37
    42
    44
    49
    53
    56
    57
    63
    66
    69
sum(abs(Aeq*Xb(1:nvar) - beq))
ans =
           32

So not terrible, but not great. Remember that we had 120 equations in 70 unknowns, but the unknowns are constrained to be binary. To convince you that a solution MAY be found if Aeq has fewer rows. So here 25 rows, with 75 columns.

Aeq = double(rand(25,75) < 0.1);
beq = ones(25,1);
[neq,nvar] = size(Aeq);
f = [zeros(nvar,1);ones(neq*2,1)];
LB = [zeros(nvar,1);zeros(2*neq,1)];
UB = [ones(nvar,1);inf(2*neq,1)];
Anew = [Aeq,eye(neq),-eye(neq)];
intcon = 1:nvar;
Xb = intlinprog(f,intcon,[],[],Anew,beq,LB,UB);

Did it work?

find(Xb(1:nvar))
ans =
     9
    11
    15
    19
    21
    32
    43
    55
    71
sum(abs(Aeq*Xb(1:nvar) - beq))
ans =
   3.7526e-14

So intlinprog was successful. And even when not successful, it will do the best it can to solve the L1 norm regression.

Best Answer

Related Solutions

MATLAB: INTLINPROG returns non-integer

MATLAB: Fmincon works but GA won’t work

Related Question