How might I project onto a non-orthogonal basis

linear algebraprojection

I was just doing some linear algebra homework the other day and I thought of a question that never came up: how might I project onto a non-orthogonal basis? Consider:
$\newcommand\colv[1]{\begin{bmatrix}#1\end{bmatrix}}$
$$\vec{u}=\colv{1\\0\\-1\\0}, \vec{v}=\colv{1\\2\\3\\4}, \vec{w}=\colv{0\\1\\3\\-4}, \vec{b}=\colv{1\\1\\1\\1}$$
$$A=\text{Span}(\vec{u},\vec{v},\vec{w})\text{ and we want to find proj}_A\vec{b} $$

Obviously the orthogonal projection formula won't work because the basis isn't orthogonal. Also, we could easily do something like Gram-Schmidt to make $\{\vec{u},\vec{v},\vec{w}\}$ orthogonal, but I'm just going for exploration here. Here's my attempt:

Find the $\hat{b}\in A$ that minimizes $\lVert\vec b – \hat b \rVert$, that is, the vector in $A$ that is closest to $\vec b$. Because $\hat{b}\in A$, we can rewrite it as a linear combination:
$$\hat{b}=c_u\vec u+c_v\vec v+c_w\vec w=\colv{c_u+c_v\\2c_v+c_w\\-c_u+3c_v+3c_w\\4c_v-4c_w}$$
$$\lVert\vec b – \hat b \rVert=\left| \left| \colv{1-c_u-c_v\\1-2c_v-c_w\\1+c_u-3c_v-3c_w\\1-4c_v+4c_w} \right| \right|=\sqrt{(1+c_u^2+c_v^2-2c_u-2c_v+2c_uc_v)+(1+4c_v^2+c_w^2-4c_v-2c_w+4c_vc_w)+(1+c_u^2+9c_v^2+9c_w^2+2c_u-6c_v-6c_w-6c_uc_v-6c_uc_w-18c_vc_w)+(1+16c_v^2+16c_w^2-8c_v+8c_w-32c_vc_w)}=\sqrt{4+2c_u^2+30c_v^2+26c_w^2-20c_v-4c_uc_v-46c_vc_w-6c_uc_w}$$

Because the inside of the square root is the dot product of $\vec b – \hat b$ with itself, I'm fairly certain you can assume that that polynomial will never be negative, and in combination with the fact that the square root is a monotone increasing function, I'm fairly certain we can take the square root out of the picture (the minimum of $\lVert\vec b – \hat b \rVert$ will be the same as the minimum of $\lVert\vec b – \hat b \rVert^2$). Same goes for the constant 4, but that certainly doesn't help much.

Great, but anything that I might have been taught in MVC that might help me solve this question certainly went dormant a while ago. So,
How can I minimize this polynomial?
$$2c_u^2+30c_v^2+26c_w^2-20c_v-4c_uc_v-46c_vc_w-6c_uc_w$$

I'll toy with the other definition of the projection. We can decompose $\vec b$ into two vectors $\hat b \in A$ and $a^{\perp} \in A^{\perp}$, $\vec b=\hat b + a^{\perp}$
Well, we know that $\hat b$ can be written as a linear combination of basis of $A$ and that $a^{\perp}$ will be orthogonal to each basis vector of $A$ (just for generality's sake, rename the basis vectors of $A$ to be $\vec{v_i}$ from the previous $\vec{u},\vec{v},\vec{w}$). That gives us
$$\langle a^{\perp}, \vec{v_j} \rangle = 0 \text{ for all j}$$
$$\langle \vec b – \sum_ic_i\vec{v_i}, \vec{v_j} \rangle = 0$$
$$\langle \sum_ic_i\vec{v_i},\vec{v_j} \rangle = \langle \vec b, \vec{v_j} \rangle$$
If I'm not mistaken, this can be represented as a matrix equation $P\vec c=\vec d$.

Let $P_{ij}=\langle \vec{v_i},\vec{v_j} \rangle$, $\vec c_i = c_i$, $\vec d_i=\langle \vec b,\vec{v_i} \rangle$

Is this representation correct?
If so, under which conditions will it be consistent?

Just for kicks, I'll try it on my example.
$$\left[\begin{array}{rrr|r}
\langle u,u\rangle & \langle u,v\rangle & \langle u,w\rangle & \langle b,u\rangle \\
\langle v,u\rangle & \langle v,v\rangle & \langle v,w\rangle & \langle b,v\rangle \\
\langle w,u\rangle & \langle w,v\rangle & \langle w,w\rangle & \langle b,w\rangle
\end{array}\right]$$

Just typing it out looks eerily similar to the orthogonal projection formula (when the basis is orthogonal, I suppose the values would be 0 out of the main diagonal), so I have high hopes that it's right, but I have no idea if what I derived is reasonable.
$$\left[\begin{array}{rrr|r}
2 & -2 & -3 & 0 \\
-2 & 30 & -5 & 10 \\
-3 & -5 & 26 & 0
\end{array}\right] \sim
\left[\begin{array}{rrr|r}
1 & 0 & 0 & \frac{335}{538} \\
0 & 1 & 0 & \frac{215}{538} \\
0 & 0 & 1 & \frac{80}{538}
\end{array}\right]
$$

$$\hat b = \frac{335}{538}\vec u + \frac{215}{538}\vec v + \frac{80}{538}\vec w = \colv{\frac{275}{269} \\ \frac{255}{269} \\ \frac{275}{269} \\ \frac{270}{269}}$$
Cool! That's pretty close to $\vec b$, so I hope what I've done is right. Even if this method works, I'm still interested in my first approach.

In an answer to this question, could you address the questions I've bolded?

Best Answer

How can I minimize this polynomial?

First, there's an extra negative sign in the derivation; there is an $18c_vc_w$ terms that should be positive, not negative, which means the polynomial you're interested in would be:

$$2c_u^2+30c_v^2+26c_w^2-20c_v-4c_uc_v-\color{red}{10}c_vc_w-6c_uc_w.$$

Multivariable calculus is probably the most straightforward method here. If we let $$f(x, y, z) = 2x^2+30y^2+26z^2-20y-4xy-10yz-6xz,$$ then we can compute $$\nabla f(x, y, z) = (4x - 4y - 6z, 60y - 20 - 4x - 10z, 52z - 10y - 6x).$$ From the geometry of the problem, we expect there to be a finite (achieved) minimum. So, if the working is thus far correct, then we expect there to be a unique minimum of $f$, which occurs at a stationary point of $f$, which occurs when $\nabla f(c_u, c_v, c_w) = 0$. Using Wolfram Alpha, the only possible solution is $$(c_u, c_v, c_w) = \left(\frac{335}{538},\frac{215}{538},\frac{40}{269}\right).$$ This agrees with your second attempt, so this is a big vote of confidence!

Is this representation correct?
If so, under which conditions will it be consistent?

It will always be consistent. There will always be a unique nearest point (closed, non-empty, convex sets in Hilbert spaces admit unique nearest points), and such a point has to satisfy the equations you've presented. If you are getting an inconsistent system, then it means you have a calculation error.

If the vectors $v_i$ you feed into the system are linearly dependent, then your method will still work, except you will obtain infinitely many solutions to your system of equations. However, the nearest point is still unique, and any choice of the infinitely many solutions will produce this same point.

This second method is another way of looking at the ordinary linear least squares method. Usually, the problem is set up like so: suppose you have a column vector $y \in \Bbb{R}^m$ (a column vector) that you wish project onto a subspace of $\Bbb{R}^m$ spanned by $v_1, \ldots, v_n$. Then, form the matrix $$A = \left(\begin{array}{c|c} &&&\\ v_1 & v_2 & \cdots & v_n \\ &&& \end{array}\right).$$ We now wish to project $y$ onto the columnspace, or equivalently, minimise $$\min_{x \in \Bbb{R}^n}\|Ax - y\|.$$ The method calls for us to take the (usually inconsistent) equation $Ax = y$ and multiply both sides by $A^\top$, giving us the "normal equation": $$A^\top A x = A^\top y.$$ If you turned this equation into an augmented matrix, you would get exactly the matrix you wrote down: $$A^\top A = \left(\begin{array}{ccc} &v_1^\top&\\ \hline&v_2^\top&\\ \hline&\vdots& \\ \hline&v_n^\top& \end{array}\right) \left(\begin{array}{c|c} &&&\\ v_1 & v_2 & \cdots & v_n \\ &&& \end{array}\right) = \begin{pmatrix} v_1^\top v_1 & v_1^\top v_2 & \cdots & v_1^\top v_n \\ v_2^\top v_1 & v_2^\top v_2 & \cdots & v_2^\top v_n \\ \vdots & \vdots & \ddots & \vdots \\ v_n^\top v_1 & v_n^\top v_2 & \cdots & v_n^\top v_n \end{pmatrix},$$ and similarly, $$A^\top y = \begin{pmatrix} v_1^\top y \\ v_2^\top y \\ \vdots \\ v_n^\top y \end{pmatrix}.$$ Given that $\langle v, w \rangle = v^\top w$, this is basically the same as what you've written.

So, like I said above, normal equations are always consistent. If you choose your spanning set to be linearly independent, $A^\top A$ will also be invertible, leading to a unique linear combination of columns of $A$ that is closest to $y$: $$x = (A^\top A)^{-1}A^\top y.$$ The actual nearest point is $$Ax = A(A^\top A)^{-1}A^\top y.$$

Related Question