You have $W = \mathrm{span}\{ (4,-4,-2), (-4,1,1) \}$. First, orthonormalize the basis of $W$ to find $\{v_1, v_2\}$ using the Gram-Schmidt algorithm. After this, an easy way to compute the projection on $W$ would be this : compute the projections of $P$ on $v_1$ and $v_2$ as follows :
$$
\mathrm{proj}_{v_1}(P) = (P \cdot v_1)v_1
$$
and
$$
\mathrm{proj}_{v_2}(P) = (P \cdot v_2)v_2
$$
Now that you have this,
$$
\mathrm{proj}_{W}(P) = \mathrm{proj}_{v_1}(P) + \mathrm{proj}_{v_2}(P).
$$
Therefore you can compute the norm of $P - \mathrm{proj}_{W}(P)$ and get the distance from $P$ to $W$. I leave the number crunching to you. If the number crunching went wrong I would need to see the numbers to help.
Note that your so-called "best approximation theorem" doesn't need the vectors $u_1$ and $u_2$ to be orthogonal. What requires orthogonality is the technique used to compute the projections, because you want to project $P$ on $v_1$ and $v_2$ and then add the individual projections. This does not work when $v_1$ and $v_2$ are not orthogonal (make yourself a little drawing if you want to be convinced, it's quite obvious).
Hope that helps,
There's a simple formula for the angle between $A$ (a $j$-blade representing some subspace of $\Bbb R^n$) and $B$ (a $k$-blade representing some other subspace of $\Bbb R^n$), where $j\le k$, given by
$$\cos(\theta) = \dfrac{\|A\ \raise .2em{\lrcorner}\ B\|}{\|A\|\|B\|}$$
where $\raise .2em{\lrcorner}$ is the left contraction product.
But because you're not familiar with geometric algebra, I'll try to give you just enough (non-rigorous) definitions so that you can calculate the angle between any two subspaces of $\Bbb R^n$.
Definitions:
Wedge Product: We define the wedge product, denoted $a\wedge b$, of two vectors $a, b\in \Bbb R^n$ as an object that is neither a vector nor a scalar and obeys all of the following:
$$\begin{align}a \wedge b &= -b \wedge a \\ a\wedge(b\wedge c) &= (a\wedge b)\wedge c \\ a\wedge a &= 0 \\ k(a\wedge b) &= (ka)\wedge b = a\wedge (kb) \tag{$k\in \Bbb R$}\\ a\wedge (b+c) &= a\wedge b + a\wedge c\end{align}$$
Blade: An object formed by the wedge product of $k$ vectors is called a $k$-blade. We define scalars as $0$-blades, vectors as $1$-blades, objects that can be written as $a\wedge b$ as $2$-blades, objects that can be written as $a\wedge b\wedge c$ as $3$-blades, etc.
$k$-vector: A $k$-vector is a linear combination of $k$-blades.
Multivector: A multivector is a linear combination of $k$-vectors.
Grade: Objects in the algebra we're building have "grades". All $k$-blades/ $k$-vectors have grade $k$. But multivectors are in general multigraded objects. For instance, $B = b_0 + b_1e_1 + b_2 e_2 + b_{12}e_1\wedge e_2$, where $b_i$ are scalars and $e_i$ are vectors, is a multigraded object.
Clifford product: The Clifford product is an associative product of multivectors satisfying:
$$ab = a\cdot b + a\wedge b \\ (AB)C = A(BC) \\ k(AB) = (kA)B = A(kB) \\ A(B+C) = AB+AC$$ for all vectors $a,b$, scalars $k$, and multivectors $A,B,C$.
- A consequence of this is that if $\{e_1, \dots, e_n\}$ is an
orthonormal basis of $\Bbb R^n$ then $e_ie_i = e_i\cdot e_i = 1$ for
all $i$ and $e_ie_j = e_i \wedge e_j$ for $i\ne j$.
Grade projection: The grade projection operator, denoted $\langle A \rangle_i$, returns the grade $i$ parts of the multivector $A$. For instance, $\langle ab\rangle_0 = \langle a\cdot b + a\wedge b\rangle_0 = a\cdot b$.
Norm: The norm of a multivector can be determined in the standard way after decomposing it into an orthonormal basis. For instance, if $A = a_0 + a_1e_1 + a_2e_2 + a_{12}e_1 \wedge e_2$ then $$\|A\| = \sqrt{a_0^2 + a_1^2 + a_2^2 + a_{12}^2}$$
Left contraction: The left contraction product of a $j$-blade $A$ and a $k$-blade $B$ is defined as $$A\ \raise .2em{\lrcorner}\ B = \langle AB\rangle_{k-j}$$
Now for an example. Consider the plane spanned by $a=2e_1 +3e_3$ and $b=e_2+e_3$ and the line spanned by $c=2e_1$.
Then the angle between that line and that plane is given by $$\cos(\theta) = \frac{\|c\ \raise .2em{\lrcorner}\ (a\wedge b)\|}{\|c\|\|a\wedge b\|}$$
So let's calculate it:
$$a\wedge b = (2e_1 +3e_3)\wedge (e_2+e_3) = 2e_1\wedge e_2 + 2e_1\wedge e_3 + 3e_3\wedge e_2 = 2e_1e_2 + 2e_1e_3 - 3e_2e_3 \\ \|a\wedge b\| = \sqrt{4+4+9} = \sqrt{17} \\ \|c\| = \sqrt{4} =2 \\ c\ \raise .2em{\lrcorner}\ (a\wedge b) = \langle (2e_1)(2e_1e_2 + 2e_1e_3 - 3e_2e_3)\rangle_1 = \langle 4e_2 + 4e_3 - 6e_1e_2e_3\rangle_1 = 4e_2 + 4e_3 \\ \|c\ \raise .2em{\lrcorner}\ (a\wedge b)\| = \sqrt{16+16} = \sqrt{32} \\ \implies \theta = \arccos\left(\frac{\sqrt{32}}{2\sqrt{17}}\right) \approx 46.7°$$
Best Answer
So just as the professor told you, you found the equation of the plane which passes through the origin. However, that is only the case when you are either told that, or your points happen to be given such that the plane defined by the three points contains the origin. If it helps, imagine that by assuming your plane went through the origin, you essentially created a parallel plane to the plane defined through the three points, and your parallel plane was created to pass through the origin, but have the same 'tilt' as the plane which passes through the three points.
Explicitly, you should have done something like: form the vectors $(1,0, 0)$ from the first two, and $1, 1, 0$ from the second two. Then their cross product is $(0,0,1)$ to give you a normal vector. Using this normal vector, we can use a point on the line and a normal vector to write the equation of a plane as: our normal vector: $(n1, n2, n3) = (0, 0, 1)$, and our point (we could use any of them, lets just choose the first) $(p1, p2, p3) = (0,1,2)$. Then our plane will be $n1(x-p1) + n2(y-p2) + n3(z-p3)$ which in this case is $0 + 0 + 1(z-2) =0 $ or $ z=2$. You can check to make sure that this works for our three points by just seeing that all of their third coordinates are in fact $2$. Hope that answers what you were still uncertain about.