[Math] Difference between complement and orthogonal complement (vector spaces)

matricesmatrix equationsvector-spaces

I have been learning some linear algebra and, specifically, solutions to simultaneous equations using matrices and the different cases that arise according to the relative number of unknowns and linearly independent equations that we have. The term 'kernel' came up when this was being taught but wasn't properly explained. On looking into this myself a bit further, I came to the conclusion that the kernel was the complement of the space spanned by the rows of the matrix, in the n dimensional space (there are n unknowns). I tried to look this up to verify if I am using the terms right (I know with complements there are sometimes difficulties with something being 'open' or 'closed'- I think in topological spaces anyway. But I am not too familiar with this and am not sure if it applies to vector spaces anyway. So I just wanted to check if the kernel really could be described as the 'complement' of the subspace spanned by the rows).

On looking into this, I keep seeing the terms 'orthogonal complement' pop up, and was wondering if 'orthogonal complement' is strictly the same thing as 'complement'? I get that the vectors in the row space are orthogonal to those in the kernel, but i'm not sure if the terms are the same.

Also, I was wondering how one could generate a set of vectors spanning the kernel of some matrix (I am not asking for someone to give this method here- I see it has been asked several times!). In the simple case of 2 linearly independent equations and three unknowns, you can take the cross product of the two linearly independent vectors that are two of your equations. I thought that perhaps this could be extended, and perhaps you could find all the vectors by taking the cross product of all of your linearly independent vectors/equations in different orders, maybe all of the cyclic permutations would do it.

For example, say that you have n-k linearly independent equations which give the row vectors

$\textbf{r}_1,\textbf{r}_2,…,\textbf{r}_{n-k}$, then you could find a set of vectors spanning the kernel by

$(\textbf{r}_{n-k} \times (\textbf{r}_{n-k-1} \times … (\textbf{r}_2 \times \textbf{r}_1)…))$

And you would cycle these around. However on looking at other posts here I have not seen this being used (instead, curiously, I have seen integrals being used). So I was wondering: would this work to generate a set of vectors spanning the kernel? Why/why not?

Best Answer

You’ve got quite a few different questions packed into this one post. In the interest of brevity, I’m going to give fairly cursory answers here. If you want to go more in depth into any of them, I recommend posting those questions separately and individually. For simplicity, I’ll only consider vector spaces and matrices over $\mathbb R$, but these ideas are more generally applicable to other base fields as well.

The kernel of a function $f$ is the set of inputs for which the output of the function is zero. Equivalently, it is the set of solutions to the homogeneous equation $f(x)=0$. Knowing the kernel of a linear function is useful because every solution to the inhomogeneous equation $f(x)=y$ $(y\ne0)$ can be written as the sum of a particular solution $v$ and an element $w$ of the kernel. This is because by linearity $f(v+x)=f(v)+f(w)=y+0=y$.

I’m sure you’ve seen this at work already when solving inhomogeneous systems of equations, but the same principle comes into play anywhere linear operators are involved. If we view differentiation as a linear operator $D$, its kernel consists of all the constant functions. The indefinite integral of a function $g$ is then the solution to the inhomogeneous equation $Df=g$, which consists of a particular solution (an antiderivative of $g$) plus an element of the kernel of $D$—the constant of integration.

Turning to orthogonal complements, if $U$ and $W$ are two subsets of a vector space $V$, we define their sum $U+W=\{\mathbf u+\mathbf w\mid\mathbf u\in U\land\mathbf v\in V\}$. If these are subspaces of $V$, then it should be fairly obvious that $U+W=\operatorname{span}(U\cup W)$ and so is itself a subspace of $V$. If the intersection of the subspaces $U$ and $W$ is trivial, then each vector in their sum has a unique expression as $\mathbf u+\mathbf w$ and we can speak of the direct sum $U\oplus W$. If $U\oplus W=V$, we say that the two subspaces are each other's complements.

Now, equip $V$ with an inner product $\langle\cdot,\cdot\rangle$, which in turn defines vectors $\mathbf u$ and $\mathbf v$ to be orthogonal if $\langle \mathbf u,\mathbf v\rangle=0$. We can then speak of the orthogonal complement $W^\perp$ of a subspace $W$ of $V$ as the set of vectors that are orthogonal to every vector in $W$. It turns out that $W\oplus W^\perp=V$. So, the orthogonal complement is indeed complementary to $W$, but $W$ can have other complements as well. For a simple example, consider the subspace $U$ of $\mathbb R^2$ spanned by some non-zero vector $\mathbf u$. For any vector $\mathbf w$ that is not a scalar multiple of $\mathbf u$, the two vectors form a basis of $\mathbb R^2$ and so, letting $W=\operatorname{span}(w)$, $\mathbb R^2=U\oplus W$, which means that each such $W$ is a complementary subspace to $U$. Only one of these will be its orthogonal complement, however.

We can consider an $m\times n$ matrix $A$ as the representation of a linear map $T:V\to W$. The kernel of $T$ is, as defined above, the set of elements of $V$ that are mapped to zero by $T$. The null space $\mathscr N(A)$ of $A$ is the set of elements $\mathbf v$ of $\mathbb R^m$ such that $A\mathbf v=0$. Since there’s an isomorphism between $V$ and $\mathbb R^m$, it’s common to identify the matrix $A$ with $T$ and speak of the kernel of $A$ as a synonym for its row space, but remember that this identification depends on a particular choice of bases for $V$ and $W$.

The row space $\mathscr R(A)$ is the subspace of $\mathbb R^m$ spanned by the rows of $A$ treated as vector. If we equip $\mathbb R^m$ and $\mathbb R^n$ with the standard Euclidean scalar product (a.k.a. dot product), then we have, as you’ve discovered yourself, $\mathscr N(A)=\mathscr R(A)^\perp$. We can rewrite this as $\mathscr N(A)=\mathscr C(A^T)$, where $A^T$ is the transpose of $A$ and $\mathscr C$ stands for the column space—the subspace spanned by its columns. We also have the complementary relationship $\mathscr N(A^T)=\mathscr C(A)^\perp$. These connections between the four fundamental subspaces of $A$ reflect a similar connection between a linear map $T:V\to W$ and its adjoint $T^*:W^*\to V^*$, which are related via $(T^*\mathbf\alpha)[\mathbf v]=\mathbf\alpha[T\mathbf v]$. I won’t go into a lot of detail here, but a linear functional is said to annihilate a set of vectors if it maps them all to zero. With that definition, we have that the image of $T^*$ annihilates the kernel of $T$ and the kernel of $T^*$ annihilates the image of $T$. If you use dual bases for $V$ and $V^*$, application of an element of $V^*$ to an element of $V$ becomes matrix multiplication. Furthermore, if the matrix of a linear map $T:V\to W$ is $A$, then the matrix of $T^*:W^*\to V^*$ relative to the dual bases is $A^T$, and so the relationships between $A$’s fundamental subspaces that I mentioned above are direct translations of the connections between $T$ and $T^*$.

Knowing that the null space of a matrix $A$ is the solution set to the homogeneous equation $A\mathbf x=0$, it’s pretty much tautological to say that one finds the null space by solving this equation, which is equivalent to solving the system of equations $a_i\cdot\mathbf x=0$, where $a_i$ is the $i$th row of $A$. A general way to solve such a system is via row-reduction, a.k.a. Gaussian elimination. A basis for the null space can be read directly from the row-reduced echelon form of the matrix.

When you’re working in $\mathbb R^3$, you can indeed solve this system of equations with a series of cross products (which is related to Cramer’s rule). The problem with generalizing this to higher dimensions is that there’s in general no equivalent to the cross product of a pair of vectors. It’s possible to define a cross product of $n-1$ vectors in an $n$-dimensional space, but I’m not sure that this would help you directly generalize your idea.

Finally, I’m going to guess that the solutions you’ve seen which involved integrals had inner products that were defined in terms of integrals. This would happen when the vectors were functions, spaces of polynomials, for instance. For such vector spaces, it’s common to have an inner product defined by $\langle f,g\rangle=\int_0^1f(t)g(t)\,dt$ or something similar. A pair of vectors—functions—in such an inner product space are orthogonal if this integral vanishes. Another common non-Euclidean scalar product arises in the theory of electrical networks (or network flow): for any positive definite matrix $A$, $\langle\mathbf v,\mathbf w\rangle=\mathbf w^TA\mathbf v$ is an inner product and solutions to these network problems involve finding orthogonal complements and projections relative to these alternative inner products: the flow capacities of edges in the network are encoded in it.