[Math] Efficient Algorithm for Iteratively Reweighted Least Squares Problem

least squaresnumerical linear algebrasvd

I'm interested in solving a weighted least squares problem of the form

$X^T W X \beta = X^T W Y$

where $W$ is a diagonal, positive definite matrix, $X \in R^{m \times n}$, $Y \in R^{m \times 1}$ and $\beta \in R^{n \times 1}$. I know that if there is no weighting matrix $W$, then the solution can be found via QR factorization or the SVD. However, it seems like both of these methods take advantage of the $Q^T Q=I$ that shows up in $X^T X$, and I can't figure out a way to reduce the LHS without that simplification.

I've also heard that the Cholesky factorization can be used to solve least squares problems, but would I need to form $X^T W X$ in this case?

Eventually, I'm hoping to apply an algorithm to this problem for many different weighting matrices $W_i$ but the same $X,Y$ matrices. If there's a solution to this problem that performs factorizations (say, of $X$) that can be reused at each step, that would be fantastic! But, at this point I'd be happy with just a solution :).

Best Answer

Yes, you can use a Cholesky factorization to solve this, by solving the so-called normal equations (that you offer above) directly. First, you must form the $n\times n$ matrix $Z=X^TWX$, then compute its Cholesky factorization $R^TR=Z$, where $R$ is upper triangular. Then $\beta=R^{-1}R^{-T}X^TWY$.

This is generally frowned upon, due to the lower numerical stability compared to other approaches, but it is often the cheapest, and if the normal equations are well-conditioned, it's fine. Indeed, many convex optimization methods use a normal equations approach within their iterative algorithms.

The most commonly offered "stable" method for least squares uses the QR factorization of $W^{1/2}X$, where $W^{1/2}$ satisfies $W^{1/2}W^{1/2}=W$. Since $W$ is diagonal, this is easy to form (in fact, you never need $W$ again if you stick with $W^{1/2}$). Note that $R$ here is the same as the Cholesky factor above, but this is a more stable, albeit more expensive, way to compute it. So again we have $\beta=R^{-1}R^{-T}X^TWY$. But since $X^T=R^TQ^TW^{-1/2}$, this simplifies to $\beta=R^{-1}Q^TW^{1/2}Y$.

For even more numerical stability, particularly if $W$ has a high condition number, you may want to use column pivoting; that is, $QRP^T=XW^{1/2}$, where $P$ is a permutation matrix chosen using the standard column pivoting process. Then $\beta=PR^{-1}Q^TW^{1/2}Y$.

One way to improve the accuracy of the results is with iterative refinement (look it up!). This is particularly useful for the normal equations approach, but it can be employed in all three options above. It is inexpensive compared to the initial solve step, since it can re-use the Cholesky or QR factorization.

I do not believe there is an easy way to reuse a factorization of $X$ or $Y$ to speed things up.

Related Question