[Math] the Moore-Penrose pseudoinverse for scaled linear regression

linear algebranumerical linear algebrapseudoinverseregression

The matrix equation for linear regression is:
$$ \vec{y} = X\vec{\beta}+\vec{\epsilon} $$
The Least Square Error solution of this forms the normal equations:

$$ ({\bf{X}}^T \bf{X}) \vec{\beta}= {\bf{X}}^T \vec{y} $$

Using the Moore-Penrose pseudoinverse:
$$ X^+ = ({\bf{X}}^T \bf{X})^{-1}{\bf{X}}^T $$
this can be written as:
$$ \vec{\beta}= A^+ \vec{y} $$
Decomposing using SVD:
$$ X=U\Sigma V^T$$ alledgely leads to:
$$ X^+=V \Sigma^+U^T $$

where the pseudoinverse of $ \Sigma $ is just found by taking the inverse of all nonzero diagonal elements.

The point of doing things this way is that I will be doing linear regression n >> 1 times and $\vec{y}$ will vary but X (and therefore $X^+$ also) will be constant.

In particular I am interested in doing "scaled" linear regression:
$$ \vec{y} = WX\vec{\beta}+\vec{\epsilon} $$
where W is a diagonal matrix (and in my case the diagonal elements are integer values) which change over n. The reason for this is that sometimes I have multiple measurements of $y_i$ for the same X.

I understand that I could solve this using the pseudoinverse of WX but this pseudoinverse would then have to be found for each n and that defeats the point.

In the case of scaled linear regression the normal equations are:
$$ ({\bf{X}}^T \bf{W}^2 \bf{X}) \vec{\beta}= ({\bf{X}}^T \bf{W}^T) \vec{y} $$

If I substitute SVD into this equation what is then the equation for $ \vec{\beta} $?

I am hoping for some cheap operation so that:
$$ \vec{\beta}= operation(W,X^+,\vec{y}) $$

As an example something like
$$ \vec{\beta}= (WX^+)\vec{y} $$
would be ideal (this example is probable not correct though),
since I would just find the pseudoinverse of X once (costly) and then multiply it with a diagonal matrix and a vector (cheap) for each n.

Best Answer

An interpretive reformulation follows.

Initial solution

Start with a matrix $\mathbf{A}\in \mathbb{C}^{m\times n}$, and a sequence of measurements $\left\{ b_{k} \right\}_{k=1}^{m}$ such that $b\notin\mathcal{R}(\mathbf{A}^{*})$. What is the best solution vector $x\in\mathbb{C}^{n}$ for the linear system $$ \mathbf{A} x = b $$ with respect to the $2-$norm?

You resolved the matrix in terms of it's SVD components $$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*}, $$ and used them to fashion the Moore-Penrose pseudoinverse $$ \mathbf{A}^{\dagger} = \mathbf{V} \, \Sigma^{\dagger} \, \mathbf{U}^{*}, $$ to pose the least squares solution $$ x_{LS} = \mathbf{A}^{\dagger} b. $$

Weighting factors

Next, you want to stir in some weighting factors via the full rank matrix $\mathbf{W}\in\mathbb{C}^{m\times m}_{m}$. You ask how to solve $$ \mathbf{W}\mathbf{A} x = b $$ The effect of the weighting factors will change the solution. It does not effect existence or uniqueness. (For proofs post another question.)

Relabel the system matrix $$ \mathbf{\hat{A}} = \mathbf{W} \mathbf{A} $$ and follow the successful path from before: $$ \mathbf{\hat{A}}^{\dagger} = \mathbf{\hat{V}} \, \hat{\Sigma}^{\dagger} \, \mathbf{\hat{U}}^{*}, $$ to pose the least squares solution $$ x_{LS} = \mathbf{\hat{A}}^{\dagger} b. $$

This post may be of interest: multiplying a linear system by an invertible diagonal matrix