I am looking for pointers to and names of computational approaches to solve a binary matrix optimization problem of:

$$ minimize: ||\mathbf{X}\mathbf{Y} – \mathbf{T}||_{L1} $$

where $\mathbf{X}$ and $\mathbf{Y}$ are binary matrices and the definition of the matrix multiplication allows for integers (is not mod-2).

- $\mathbf{X}$ is $\{0,1\}^{H \times M}$
- $\mathbf{Y}$ is $\{0,1\}^{M \times W}$
- $\mathbf{T}$ is $[\![0,N]\!]^{H \times W}$

The only known is $\mathbf{T}$. The $M$ dimension of $\mathbf{X}$, $\mathbf{Y}$ is not set and can "grow" for a better solution. Eventually, there may be a desire to find an optimal smallest size of the M dimension of $\mathbf{X}$, $\mathbf{Y}$ with a given value of the objective function.

One approach I have considered is that the dimension $M$ is set by the largest value in $\mathbf{T}$ and the corresponding elements in $\mathbf{X}$, $\mathbf{Y}$ are initialized to $1$ and then an optimization algorithm begins. In this case, would I consider a vector input to an optimization algorithm (e.g. scipy.optimize.minimize) and reshape this $1 \times (H*M+W*M)$ vector into matrices when evaluating the objective function?

I have looked at the following computational packages but am not sure of the best functions within those packages:

- YALMIP: MATLAB package for optimization and linear programming.
- Python package cvxopt specifically
*ilp*from*cvxopt.glpk* *scipy.optimize.minimize*with bounds

I have also considered solving for $\textbf{X}$ (or $\textbf{Y}$) as all positive reals (constrained by some maximum) and then converting $\textbf{X}$ to a binary matrix by growing the M dimension of $\textbf{X}$ based on a "resolution of the discretization". For example, if a matrix element was found to be 0.50 and the resolution is set to 0.1 the resulting binary A matrix would have a run of 5 1s. To use this approach I would solve for the $\textbf{X}$ after an in random guess for $\textbf{Y}$ using a conventional matrix multiplication solver from numpy such as

```
> Y = np.random.randint(2,size=(3,4))
> Y_pinv = np.linalg.pinv(Y)
> X = np.dot(T, Y_pinv)
```

Yet, since $\textbf{Y}$ is not optimized the solution is far from optimal.

## Best Answer

For fixed $M$, you want to minimize $$\sum_{i=1}^H \sum_{j=1}^W \left|\sum_{k=1}^M X_{i,k} Y_{k,j} - T_{i,j}\right|. \tag1$$ You can linearize the absolute value in $(1)$ by introducing nonnegative decision variables $E^+_{i,j}$ and $E^-_{i,j}$, yielding the problem of minimizing $$\sum_{i=1}^H \sum_{j=1}^W (E^+_{i,j} + E^-_{i,j}) \tag2$$ subject to $$\sum_{k=1}^M X_{i,k} Y_{k,j} - T_{i,j} = E^+_{i,j} - E^-_{i,j} \quad \text{for all $i$ and $j$}. \tag3$$ You can linearize this quadratic constraint $(3)$ by introducing binary (or just nonnegative) decision variables $Z_{i,k,j}$ to represent $X_{i,k} Y_{k,j}$ and imposing linear constraints \begin{align} \sum_{k=1}^M Z_{i,k,j} - T_{i,j} &= E^+_{i,j} - E^-_{i,j} && \text{for all $i$ and $j$} \tag4 \\ Z_{i,k,j} &\le X_{i,k} &&\text{for all $i,k,j$} \tag5 \\ Z_{i,k,j} &\le Y_{k,j} &&\text{for all $i,k,j$} \tag6 \\ Z_{i,k,j} &\ge X_{i,k} + Y_{k,j} - 1 &&\text{for all $i,k,j$} \tag7 \end{align} The linearized problem is to minimize $(2)$ subject to $(4)$ through $(7)$.