Non-negative least-squares with random variables

convex optimizationleast squaresoptimizationstatistics

I've been working on a side project for a while that requires a huge amount of simple optimization, and have run into a pretty nasty problem while working on it. If any of you could give some advice I would really appreciate it.

Background

I have the three following variables:

  • A known MxN matrix A
  • A known length M column vector b
  • An unknown length N column vector x.

Furthermore, in my problem they are related by the following expression:
$$Ax=b$$

In my project, I need a solution for x that satisfies this expression accurately as possible. By itself, this is a pretty well-studied problem! For instance, I am aware that I can multiply by the Pseudoinverse of A to find a good solution for x. However, in my problem, I also have the constraint that x must be non-negative. Thus, I instead am using a Non-Negative Least Squares Algorithm to solve for x:
$$x=NNLS(A,b)$$

So far, this solution has worked pretty well for my application!

The Problem

Recently, I have run into situations where every variable inside A is a gaussian random variable. Fortunately, I know both the mean and variance of every element in A with pretty high precision. In addition, x is entirely non-random.

Under these conditions, the problem slightly changes. I now want to find the optimal solution of x such that the result of the multiplication $Ax$ will have a mean value of b and a low variance in every element. However, I am struggling to figure out how to find the best solution for x with this information.

If I just use non-negative least squares to solve for this, I run into some issues. I would imagine that, if column j has many elements with extremely large standard deviations, the jth element of x should be very small. This is because it would not be advantageous to include noisy matrix elements when trying to generate an approximation of b with little variance.

However, all versions of the non-negative least squares algorithm, that I have seen, do not consider the possibility of noise or any randomness. This means that they would allow for large variance in b because they would only consider the mean of A!

The Question

Is there any way to make a non-negative least squares algorithm consider noise when optimizing for x? If not, are there alternative optimization methods I could use to solve for x so that the vector b has relatively low variance?

Note about my background: I've been told it is useful to responders to know the background of people who ask questions here. I'm an engineer. While I am pretty familiar with mathematics, I am not very familiar with many optimization algorithms, and do not know a massive amount about statistics.

Best Answer

I mentioned in the comments that the problem looks like quadratic programming. And indeed it is. Here are the details. (There is a lot of computation, so there might be mistakes.)

For convenience, put $y = Ax$, which is a random vector in $\mathbb{R}^n$. You are trying to minimize $\text{Var}(y_1) + \dotsb + \text{Var}(y_n)$. Let's express this objective in terms of what's known.

We have a constraint that $\mathbb{E}[Ax] = b$. The covariance matrix of $y$ is then $\Sigma = \mathbb{E}[(y - b)(y - b)^T]$, and $\text{Var}(y_1) + \dotsb + \text{Var}(y_n)$ is exactly the sum of diagonals of $\Sigma$, i.e., $\text{tr}(\Sigma)$.

Now let's simplify this even further. \begin{align*} (y - b)(y - b)^T & = yy^T - b y^T - yb^T + bb^T\\ & = Axx^T A^T - b x^T A^T - Ax b^T + bb^T \end{align*} The covariance matrix is \begin{align*} \Sigma & = \mathbb{E}[(y - b)(y - b)^T]\\ & = \mathbb{E}[Axx^T A^T] - \mathbb{E}[bx^T A^T] - \mathbb{E}[Axb^T] + bb^T \end{align*} and \begin{align*} \text{tr}(\Sigma) & = \text{tr}(\mathbb{E}[Axx^T A^T]) - \text{tr}(\mathbb{E}[bx^T A^T]) - \text{tr}(\mathbb{E}[Axb^T]) + \text{tr}(bb^T) \end{align*} For the first term, we use two tricks: (1) the trace commutes with expectation; (2) the trace is cyclically invariant. We get \begin{align*} \text{tr}(\mathbb{E}[Axx^T A^T]) & = \mathbb{E}[\text{tr}(Axx^T A^T)]\\ & = \mathbb{E}[\text{tr}(A^T A xx^T)]\\ & = \text{tr}(\mathbb{E}[A^T A xx^T])\\ & = \text{tr}(\mathbb{E}[A^T A] xx^T) \end{align*} Since you know the mean and variance of $A$, $\mathbb{E}[A^T A]$ can be computed from this information. Let's put $M = \mathbb{E}[A^T A]$. Note that $M$ is a positive semidefinite matrix of size $n \times n$. Continuing, we have \begin{align*} \text{tr}(\mathbb{E}[Axx^T A^T]) & = \text{tr}(M xx^T)\\ & = \text{tr}(x^T M x)\\ & = x^T M x \end{align*} That takes care of the first term.

For the second term, put $F = \mathbb{E}[A] \in \mathbb{R}^{m \times n}$, which is known. We get \begin{align*} - \text{tr}(\mathbb{E}[b x^T A^T]) & = - \text{tr}(b x^T F^T)\\ & = - \text{tr}(x^T F^T b)\\ & = - x^T F^T b \end{align*}

Do the same for the third term.

So in the end, our objective $\text{Var}(y_1) + \dotsb + \text{Var}(y_n)$ simplifies to $$ x^T M x - 2 x^T F^T b + \text{tr}(bb^T) $$ This is just a quadratic expression in $x$. Moreover, since $M$ is positive semidefinite, this is a convex quadratic expression.

The whole problem boils down to the following quadratic program: \begin{align*} \underset{x \in \mathbb{R}^n}{\text{minimize}} & \quad x^T M x - 2x^T F^T b + \text{tr}(bb^T)\\ \text{subject to} & \quad F x = b\\ & \quad x \geq 0, \end{align*} where $M = \mathbb{E}[A^T A] \in \mathbb{R}^{n \times n}$, $F = \mathbb{E}[A] \in \mathbb{R}^{m \times n}$. Pretty much any convex optimization package can handle it. (For example, check out CVXPY.)


Update: As fedja pointed out in the comment, the problem simplifies further. Since $Fx = b$, the term $-2x^T F^T b$ is just $-2b^T b$, which is a constant. For optimization purposes, we can drop it, as well as the last term $\text{tr}(bb^T)$ from the objective. In the end, we just have \begin{align*} \underset{x \in \mathbb{R}^n}{\text{minimize}} & \quad x^T M x\\ \text{subject to} & \quad F x = b\\ & \quad x \geq 0, \end{align*} It's still a quadratic program.

Related Question