Constrained Least squares problem. Help

lagrange multiplierleast squaresmatrix equationsnonlinear optimizationoptimization

I have a system $Y=X\beta$ and I am looking for the Least squares solution (or some optimal solution) of $\beta$, but certain constraints over $\beta$ must be fulfilled. $Y$ is a matrix $(100\times3)$, $X$ is also $(100\times3)$ and $\beta$ is a square matrix $(3\times3)$. $X$ and $Y$ are non negative, and all elements of beta are between $0$ and $1$). The constraints to this problem are:

  1. The rows of $\beta$ must sum up to one (I mean, the sum of the elements for each row must be equal to one $\sum_j\beta_{i,j}=1$).
  2. Each element of $\beta$ is restricted to certain interval: $a_{ij}\leq\beta_{ij}\leq b_{ij}$ but still within the range [0,1]. (the intervals are long enough that still allow for the first constraint to be satisfied.

\

Update —————————————————–

Up until now, starting to solve the problem with just the first constraint, I have come up with the idea of multiplying $\beta$ by a matrix of ones and make it equal to that "ones" matrix. Then my minimization function is looking like this:

$J(\beta,\lambda)=|Y-X\beta|^2 + \lambda^T(A\beta^T-A)$; with $A=\begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}$

$\Rightarrow J(\beta,\lambda)=Y^TY-2\beta^TX^TY+\beta^TX^TX\beta + \lambda^T(A\beta^T-A)$

When applying Lagrangian multipliers method, I got the next equation system:

$(1)\ \frac{\partial J}{\partial\beta}=-2Y^TX+2X^TX\beta+A\lambda=0$

$(2)\ \frac{\partial J}{\partial\lambda}=\beta A-A=0$

And when solving for beta (from equation 1) and replacing it into equation 2 to get lambda:

$(X^TX)^{-1}Y^TX-\frac{1}{2}(X^TX)^{-1}A\lambda=\beta$

$(X^TX)^{-1}Y^TXA-\frac{1}{2}(X^TX)^{-1}A\lambda A=A$

The problem I have is that I donĀ“t really know how to solve this system, or even if it is well formulated at all or if the use of this "A" matrix be the right approach to include the first constraint. And also still need to figure it out how to include the constraint 2.

I really appreciate any help someone can give to solve this. Or a reference that I can research on. Thanks in advance

Best Answer

$ \def\a{\alpha}\def\b{\beta} \def\l{\lambda}\def\s{\sigma} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $Collect the lower/upper interval bounds into matrices $$(a_{ij},\,b_{ij})\to(A,B)$$ Then given an unconstrained matrix $U,\,$ you can construct a matrix which satisfies the second constraint using elementwise multiplication $(\odot)$ $$\eqalign{ W &= \fracLR{B+A}{2} + \fracLR{B-A}{2}\odot \tanh(U) \\ -\infty&\le U_{ij}\le \infty \;\qiq A_{ij}\le W_{ij}\le B_{ij} \\ }$$ where the hyperbolic tangent can be replaced by any sigmoid function whose output range is normalized $\in[-\o,\o].\,$ The tangent is nice because it has a simple differential $$\eqalign{ H &= \tanh(U) \\ dH &= (J-H\odot H)\odot dU \\ }$$
Subsequently, the first constraint can be satisfied using the all-ones matrix $(J)$ and elementwise division $(\oslash)$ $$\eqalign{ \b &= W\oslash WJ \;=\; \Diag{W\o}^{-1}W \\ \b\o &= \o \\ }$$ Calculate the gradient of the least-squares objective with respect to the unconstrained variable $$\eqalign{ \phi &= \tfrac 12\|X\b-Y\|_F^2 \\ d\phi &= \CLR{X^TX\b-X^TY}:d\b \\ &= \c{P}:d\b \\ &= P:\LR{dW\oslash WJ - \b\oslash WJ\odot dW\,J} \\ &= \LR{P\oslash WJ}:dW - \LR{P\odot\b\oslash WJ}:dW\,J \\ &= \BR{P\oslash WJ-\LR{\b\odot P\oslash WJ}J}:dW \\ &= \BR{P\oslash WJ-\LR{\b\odot P\oslash WJ}J}:\fracLR{B-A}{2}\odot\LR{J-H\odot H}\odot dU \\ &= \BR{P\oslash WJ-\LR{\b\odot P\oslash WJ}J}\odot\fracLR{B-A}{2}\odot\LR{J-H\odot H}:dU \\ \grad{\phi}{U} &= \BR{P\oslash WJ-\LR{\b\odot P\oslash WJ}J}\odot\fracLR{B-A}{2}\odot\LR{J-H\odot H} \\ }$$ This gradient expression is so complicated that there's probably not a closed form solution for $U$ which yields a zero-gradient.

However, you can use this gradient in any gradient descent method to numerically calculate the optimal $U$ matrix, after which you can calculate the corresponding $W$ and $\b$ matrices.


In some of the steps above, a colon is used to denote the matrix inner product $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$

Update

Based on the discussion in the comments, you cannot satisfy both constraints by clever construction. However, keeping the $(A,B)$ construction $$\eqalign{ \b &= \fracLR{B-A}{2}\odot \tanh(U) + \fracLR{B+A}{2} \\ d\b &= \c{\fracLR{B-A}{2}\odot\BR{J-H\odot H}}\odot dU \\ &= \c{R}\odot dU \\ }$$ and adding the simpler constraint to the objective function yields $$\eqalign{ \phi &= \tfrac{\l}2\|X\b-Y\|_F^2 \;+\; \tfrac 12\|\b\o-\o\|_F^2 \\ d\phi &= \l P:d\b \;+\; \CLR{\b J-J}:d\b \\ &= \LR{\l P + \c{Q}}:\LR{R\odot dU} \\ \grad{\phi}{U} &= \LR{\l P + Q}\odot R \\ }$$ This gradient represents an unconstrained problem which can be solved by any gradient descent method for various values of the $\l$ parameter. The $\l$ parameter controls the relative importance of satisfying the original objective function versus the constraint.

Related Question