We just learned in our linear algebra class about the Riesz Representation Theorem, which states that if $V$ is finite-dimensional and $f$ is a linear functional on $V$, then there is a unique vector $u$ in $V$ such that
$f(v) = <v,u>$
for every $v$ in $V.$
Can someone please give some geometric intuition in complex field about why this theorem is right?
and what is the connection between the theorem and with the conjugate part in the inner product in complex field.
Thank you.
Riesz Representation Theorem geometric intuition
inner-productslinear algebralinear-transformationsriesz-representation-theorem
Related Solutions
Take $C[0,1]$ with the $L^2$ inner product. Let $\phi(f) = \int_{1 \over 2}^1 f(t) dt$.
It is straightforward to see that $\phi$ is bounded by Cauchy-Schwarz.
To see that $\phi$ cannot be represented by an element of $C[0,1]$ we proceed by contradiction. Suppose $\phi(f) = \int_0^1 g(t) f(t) dt$.
Let $f_n$ be the continuous function whose graph is given by joining the points $(0,1), ({1\over 2}-{1 \over n}, 1),({1 \over 2}, 0), (1,0)$. Note that $0=\phi(g \cdot f_n) = \int_0^1 g^2(t) f_n(t) dt \ge \int_0^{{1\over 2}-{1 \over n}} g^2(t) dt$ from which it follows that $g(t) = 0 $ for $t \in [0,{1 \over 2}]$.
Now choose a sequence of positive continuous functions $f_n$ such that $f_n$ has support on $[{1 \over 2}, {1 \over 2}+ {1 \over n}]$ and $\int_0^1 f_n(t) dt = 1$, then we have $\phi(f_n) = 1$ for all $n$, but continuity of $g$ gives $\lim_n \phi(f_n) = g({1 \over 2}) = 0$, a contradiction.
Addendum: Here is a marginally simpler ending to the above proof: Let $\bar{\phi}(f) = \int_0^{1 \over 2} f(t) dt$ and note that $\phi(f) + \bar{\phi}(f) = \int_0^1 f(t) dt$. Since $\int_0^1 f(t) = \langle 1, f \rangle$, if we have $\phi(f) = \int_0^1 g(t) f(t) dt $, then this gives $\bar{\phi}(f) = \int_0^1 (1-g(t)) f(t) dt$. As above, we see that we must have $g(t) = 1$ for $t \in [{1 \over 2},1]$, which contradicts the continuity of $g$ at $t={1\over 2}$.
First let's start with the transpose operator. The transpose can be defined without reference to an inner product and only requires the vector space structure. The way to think about the tranpose is that it turns a linear map $f$ from $V$ to $W$ into a linear map $f^T$ from $W^*$ to $V^*$, and it does this in the most "natural" way possible.
If words, if we are trying to define a map from $W^*$ to $V^*$ we start with a function $g \in W^*$ and we want to turn this into a function that acts on vectors in $V$ and returns a real number. Well we know that $g$ can act on vectors in $W$ and we know that using $f$ we can turn vectors in $V$ into vectors in $W$ and chaining these together gives us a way to let $g$ act on vectors in $V$!
In symbols, what we just did above is define $f^T(g)(v) = g(f(v))$. We first use $f$ to map $v$ from a vector in $V$ to a vector in $W$ and then we use our element $g \in W^*$ to map it to a real number.
Now the Hermitian adjoint operator is a way of turning a function $f: V \to W$ into a function $f^*: W \to V$. The best way of thinking about this operator is through the transpose. The transpose gives us an operator $f^T: W^* \to V^*$. To turn this into a map from $W$ to $V$, we need to fix isomorphisms between $W$ and $W^*$ and $V$ and $V^*$. To map a vector $w$ from $W$ to $V$, you use the isomorphism between $W$ and $W^*$ to map $w$ to $w' \in W^*$, then you use the tranpose to map it to a vector $v' \in V^*$ and then you use the isomorphism between $V$ and $V'$ to map it to a vector $v \in V$.
You could describe the above construction with the following diagram: $\require{AMScd}$ \begin{CD} W^* @>{f^T}>> V^*\\ @VV{\cong}V @VV{\cong}V\\ W @>{f^*}>> V \end{CD}
Now the connection with inner product spaces and the Riesz Representation Theorem is that an inner product on a vector space $V$ gives you a "natural" way to define an isomorphism between $V$ and $V^*$.
For every vector $v \in V$, we can define the map $\varphi_v: V \to K$ (where $K$ is the base field) by $\varphi_v(\cdot) = \langle \cdot,v \rangle$. By the linearity of the inner product, $\varphi_v$ is an element of the dual space $V^*$. In addition the map from $V$ to $V^*$ given by $v \to \varphi_v$ is (anti)-linear and by the non-degeneracy of the inner product, it is injective. The Riesz Representation Theorem says that this injection is also a surjection and therefore a bijection. This means that whenever we have an inner product we can define $\varphi_v$ for every $v \in V$ and the map sending $v$ to $\varphi_v$ is a bijection between $V$ and $V^*$.
Now putting these two concepts together (the tranpose and the connection between inner product spaces and dual spaces), we can define the transpose $f^T$ without the inner product structure as a map from $W^* \to V^*$. The inner product structure gives us isomorphisms between $V$ and $V^*$ and $W$ and $W^*$, and chaining these isomorphisms together with the tranpose we can get a Hermitian adjoint map $f^*$ which maps $W$ to $V$.
EDIT: Added connection with $\mathbb{C}$.
Whenever we start writing linear transformations as matrices we first have to fix a basis. This also gives an alternate way of defining an isomorphism between $V$ and $V^*$ (search for "dual basis") and it is a useful exercise to verify that these two isomorphisms between $V$ and $V^*$ agree if and only if the basis that you started with is orthonormal when the base field is $\mathbb{R}$. In this case, let $v_1,\ldots,v_n$ be an orthonormal basis for $V$ and let $v^1,\ldots,v^n$ be its dual basis. Similarly, let $w_1,\ldots,w_m$ be an orthonormal basis for $W$ and let $w^1,\ldots,w^n$ be its dual basis.
Now if we let the matrix representation of $f$ be
$$\begin{bmatrix} f_{11} & \ldots & f_{1n} \\ \vdots & \ddots & \vdots \\ f_{m1} & \ldots & f_{mn} \end{bmatrix}$$
then we have $f(v_i) = \sum_j f_{ji} w_j$. Now if we consider the transpose we have
$$f^T(w^i)(v_k) = w^i(f(v_k)) = w^i\left( \sum_j f_{jk} w^j \right) = f_{ik} = \left( \sum_j f_{ij} v^j \right)(v_k)$$
so
$$f^T(w^i) = \sum_j f_{ij} v^j$$
so the matrix representation of $f^T$ is exactly the traditional transpose of the matrix representation of $f$.
Now let us consider the adjoint in the case of $K = \mathbb{C}$. In this case we have that the map from $v$ to $\varphi_v$ is anti-linear.
Let $w = \sum_i c_i w_i$. Then $\varphi_w = \sum_i \overline{c_i} w^i$. Now if we apply $f^T$ to this we get
$$f^T(\varphi_w) = f^T\left(\sum_i \overline{c_i} w^i\right) = \sum_j \left(\sum_i \overline{c_i}f_{ij}\right) v^j$$
Finally, when we apply the anti-linear isomorphism between $V$ and $V^*$ we get that
$$ f^*(w) = \left(\sum_i c_i\overline{f_{ij}}\right) v^j $$
so the matrix representation of $f^*$ is
$$\begin{bmatrix} \overline{f_{11}} & \ldots & \overline{f_{1m}} \\ \vdots & \ddots & \vdots \\ \overline{f_{n1}} & \ldots & \overline{f_{nm}} \end{bmatrix}$$
which is the conjugate transpose. The basic intuition is that the isomorphism between $W$ and $W^*$ conjugates all of the coefficients of $w$, then this is fed through the transposed matrix $f^T$ and then all of the coefficients are conjugated once again which flips the coefficients of $w$ back and conjugates the coordinates of $f^T$.
EDIT 2: For completeness, I figured I would add why the adjoint as I defined it above is equivalent to the traditional definition of the adjoint as the unique operator $f^*$ such that for all $v,w$ we have
$$\langle f(v), w \rangle = \langle v, f^*(w) \rangle.$$
If you look at the commutative diagram above, we will start with an element $w \in W$ in the bottom left and trace it using the two paths to the top right. First, we can go right to $f^*(w) \in V$. Then we can use the isomorphism between $V$ and $V^*$ to go up to $\varphi_{f^*(w)}$.
Alternatively, we could first go up using the isomorphism between $W$ and $W^*$ to get to $\varphi_w$ and then move right to get to $f^T(\varphi_w)$ so we have that
$$ f^T(\varphi_w) = \varphi_{f^*(w)} $$
as elements of $V^*$. Now evaluating both sides on $v \in V$ gives
$$ \langle f(v), w \rangle = \varphi_w(f(v)) = f^T(\varphi_w)(v) = \varphi_{f^*(w)}(v) = \langle v, f^*(w) \rangle $$
as desired.
Best Answer
We can look at the case $V = \mathbb{R}^n$. Let $f$ be a linear functional $f: \mathbb{R}^n \to \mathbb{R}$. Let $e_1, …, e_n$ denote the standard basis vectors.
Then for each vector $v = (v_1, …, v_n)$, we have $f(v) = f(v_1e_1 + … + v_ne_n) = v_1f(e_1) + … + v_nf(e_n) = \langle v, u \rangle$, where $u := (f(e_1), …, f(e_n))$. So, every linear functional is given as an inner product with a vector: just choose the vector whose coordinates are $f$ applied to the standard basis vectors $e_i$.
Since $f$ is a linear transformation, we can ask what its kernel and image is. If $f(e_i) = 0$ for all $i$, then $f$ is just the zero transformation, so it’s not so interesting. Otherwise $f(e_i) \neq 0$ for some $i$, so the image of $f$ is all of $\mathbb{R}$, because $\mathbb{R}$ is spanned by any nonzero vector. By the rank-nullity theorem, the kernel of $f$ has dimension $n - 1$. In other words, $f$ collapses a hyperplane (i.e. a subspace of dimension $n - 1$) to the point $0$. The kernel is a hyperplane.
Now notice that the kernel is the set of all vectors $v$ such that $f(v) = \langle v, u \rangle = 0$. In other words, it is the set of all vectors that are orthogonal to the vector $u$. This has a geometric interpretation. In $\mathbb{R}^3$, for example, the kernel would be the plane normal to the vector $u$.
Now you might say, “For any given plane, there are many vectors that are normal to it. Yet the theorem says there is a unique vector $u$. In other words, you’ve shown existence, but you haven’t shown uniqueness.”
Here is some intuition for this in $\mathbb{R}^3$. Imagine picking a plane in $\mathbb{R}^3$ and then asking for one of its normal vectors. Say, the plane is the $xy$-plane, and a normal vector is $(0,0,1)$. Now define $f: \mathbb{R}^3 \to \mathbb{R}$ such that $f(e_1) = 0, f(e_2) = 0$ and $f(e_3) = 1$. This uniquely defines $f$, because we’ve specified what $f$ should do to a basis. Clearly $f(v) = \langle v, (f(e_1), f(e_2), f(e_3)) \rangle = \langle v, (0,0,1) \rangle = 0$ for all $v$ in the plane, because that’s what it means for the vector $(0,0,1)$ to be normal to the plane. However, you can imagine that we might have chosen a different normal vector to the plane. Say, suppose we chose $(0,0,5)$ instead. Then you can see that this in turn uniquely defines a different map $f’$. It is the map $f’$ that sends $e_1$ to $0$, $e_2$ to $0$, and $e_3$ to $5$. And so on: Any particular scaling of a normal vector will give you a unique linear map.
In general, we have uniqueness, because: If $f(v) = \langle v, u_1 \rangle = \langle v, u_2 \rangle$ for all $v$, then $\langle v, u_1 - u_2 \rangle = \langle v, u_1 \rangle - \langle v, u_2 \rangle = 0$ for all $v$. So for $v = u_1 - u_2$, we have $\langle u_1 - u_2, u_1 - u_2 \rangle = 0$. The only way we can have a vector whose inner product with itself is $0$ is if we have the zero vector. Hence $u_1 = u_2$, which shows uniqueness.