Why did mathematicians choose the inner product to be linear in the first argument instead of the second

hilbert-spacesinner-productslinear algebra

From my limited experience with inner product spaces, it seems like the inner product being linear in the second argument would facilitate smoother notation. For instance, for $x \in H$, we could define $x^* \in H^*$ by $$x^*y = \langle x, y\rangle $$ Then this would generalize the fact that $x^T y = \langle x, y\rangle$ on $\mathbb{R}^n $.

Does linearity in the first argument make for smoother notation in some other aspect of Hilbert space theory?

Best Answer

I have taught linear algebra using both conventions and I agree with your conclusion. I found the "physicist" convention having more advantages than disadvantages when working over $\mathbb{C}$ (or working simultaneously over $\mathbb{F}$ where $\mathbb{F} \in \left \{ \mathbb{R}, \mathbb{C} \right \}$). Those include:

  1. It is now standard that vectors are identified with column vectors while covectors are identified with row vectors. Thus, the standard inner product on $\mathbb{R}^n$ is written in terms of matrix product as $\vec{x}^T \cdot \vec{y}$ (and cannot be written as $\vec{x} \cdot \vec{y}^T$). By replacing $T$ with $*$, one gets a standard inner product $\vec{x}^{*} \cdot \vec{y}$ on $\mathbb{C}^n$ which generalizes the real case and is naturally anti-linear in the first variable. In order to describe the standard inner product using a linear-in-the-first-variable convention on column vectors, one must define $\left< \vec{x}, \vec{y} \right> = \vec{y}^{*} \cdot \vec{x}$ which is more awkward.
  2. The Riesz anti-isomorphism $V \mapsto V^{*}$ is given by $v \mapsto \left< v, \cdot \right>$. This is consistent with the idea that "$v$ acts on some vector $w$ by $\left< v, w \right>$" and is even clearer with the bra-ket notation in which a vector $v \in V$ defines a linear functional $\left< v \right|$ by $\left< v \right|(w) := \left< v \, | \, w \right>$. This imposes the requirement that the inner product is linear in the second variable.
  3. The expansion of a vector $v$ in an orthonormal basis $(e_1,\dots,e_n)$ is written as $\sum_{i=1}^n \left< e_i, v \right> v$ which is consistent with the dual space notation $\sum_{i=1}^n e^i(v) v$ where $e^i$ is $i$-th element in the dual basis which gives you the $i$-th coordinate of a vector.
  4. The matrix coefficients of a linear operator $T$ with respect to an orthonormal basis $e_1,\dots,e_n$ are given by $a_{ij} = \left< e_i, T(e_j) \right>$ (as opposed to $a_{ij} = \left< T(e_j), e_i \right>$ which is more awkward) while the matrix coefficient of $T^{*}$ are given by $\left< e_j, T(e_i) \right>$ (as opposed to $\left< T(e_i), e_j \right>$...).

The only mildly annoying thing I noticed with the "physicist" convention is that the defining property for the adjoint operator is naturally written as $\left< T^{*}v, w \right> = \left< v, Tw \right>$ while I was used to the form $\left< Tv, w \right> = \left< v, T^{*}w \right>$. Both forms are equivalent but if one wants to use the Riesz anti-isomorphism to justify the existence of $T^{*}$, the form $\left< T^{*}v, w \right> = \left< v, Tw \right>$ is more natural and takes some time getting used to.