To me continuity is more geometric and intuitive than the rest of the argument (which is purely algebraic manipulation). So I take the liberty to mis-read you question as follows:
- Is it possible to derive linearity of the inner product from the parallelogram law using only algebraic manipulations?
By "only algebraic" I mean that you are not allowed to use inequalities. (It is triangle inequality that allows one to use continuity. In fact, one can derive continuity using only the inequality $|u|^2\ge 0$ and the parallelogram law.) Also, an algebraic argument must work over any field on characteristic 0.
The answer is that it is not possible. More precisely, the following theorem holds.
Theorem. There exists a field $F\subset\mathbb R$ and a function $\langle\cdot,\cdot\rangle: F^2\times F^2\to F$ which is symmetric, additive in each argument (i.e. $\langle u,v+w\rangle=\langle u,v\rangle+\langle u,w\rangle$), satisfies the identity $\langle tu,tv\rangle = t^2\langle u,v\rangle$ for every $t\in F$, but is not bi-linear.
Note that the above assumptions imply that the "quadratic form" $Q$ defined by $Q(v)=\langle v,v\rangle$ satisfies $Q(tv)=t^2Q(v)$ and the parallelogram identity, and the "product" $\langle\cdot,\cdot\rangle$ is determined by $Q$ in the usual way. [EDIT: an example exists for $F=\mathbb R$ as well, see Update.]
Proof of the theorem. Let $F=\mathbb Q(\pi)$. An element $x\in F$ is uniquely represented as $f_x(\pi)$ where $f_x$ is a rational function over $\mathbb Q$. Define a map $D:F\to F$ by $D(x) = (f_x)'(\pi)$. This map satisfies
Define $P:F\times F$ by $P(x,y) = xD(y)-yD(x)$. From the above identities it is easy to see that $P$ is additive in each argument and satisfies $P(tx,ty)=t^2 P(x,y)$ for all $x,y,t\in F$. Finally, define a "scalar product" on $F^2$ by
$$
\langle (x_1,y_1), (x_2,y_2) \rangle = P(x_1,y_2) + P(x_2,y_1) .
$$
It satisfies all the desired properties but is not bilinear: if $u=(1,0)$ and $v=(0,1)$, then $\langle u,v\rangle=0$ but $\langle u,\pi v\rangle=1$.
Update. One can check that if $\langle\cdot,\cdot\rangle$ is a "mock scalar product" as in the theorem, then for any two vectors $u,v$, the map $t\mapsto \langle u,tv\rangle - t\langle u,v\rangle$ must be a differentiation of the base field. (A differentiation is map $D:F\to F$ satisfying the above rules for sums and products.) Thus mock scalar products on $\mathbb R^2$ are actually classified by differentiations of $\mathbb R$.
And non-trivial differentiations of $\mathbb R$ do exist. In fact, a differentiation can be extended from a subfield to any ambient field (of characteristic 0). Indeed, by Zorn's Lemma it suffices to extend a differentiation $D$ from a field $F$ to a one-step extension $F(\alpha)$ of $F$. If $\alpha$ is transcedental over $F$, one can define $D(\alpha)$ arbitrarily and extend $D$ to $F(\alpha)$ by rules of differentiation. And if $\alpha$ is algebraic, differentiating the identity $p(\alpha)=0$, where $p$ is a minimal polynomial for $\alpha$, yields a uniquely defined value $D(\alpha)\in F(\alpha)$, and then $D$ extends to $F(\alpha)$. The extensions are consistent because all identities involved can be realized in the field of differentiable functions on $\mathbb R$, where differentiation rules are consistent.
Thus there exists a mock scalar product on $\mathbb R^2$ such that $\langle e_1,e_2\rangle=0$ but $\langle e_1,\pi e_2\rangle=1$. And I am sure I reinvented the wheel here - all this should be well-known to algebraists.
The map $f(z)=(1+z)^{-1} - (1+\sigma)^{-1}$ maps the disk of radius $\sigma$ into the right half plane as a function of one complex variable.
Therefore, essentially by von Neumann's inequality, we get that $$\frac{f(A)+f(A)^*}{2}=\mathrm{Re }f(A)\geq 0$$ since $\|A\|\leq \sigma.$ Assuming $A$ has real entries, this implies the claim as
$$\langle (1+A)^{-1}x,x\rangle = \langle \mathrm{Re} (1+A)^{-1}x,x\rangle.$$
To see the calculation with von Neumann's inequality more explicitly, let $\psi(z) = \frac{z-1}{z+1}.$ Note $\psi$ takes the right half plane to the disk. So, $\psi \circ f$ takes the disk of radius $\sigma$ into the unit disk.
Therefore, von Neumann's inequality states that
$\|\psi \circ f(A)\|\leq \sup_{z\in \sigma\mathbb{D}} |\psi\circ f(z)| \leq 1.$
Note that $$1-(\psi \circ f(A))^*(\psi \circ f(A)) \geq 0.$$ (Here by $T\geq 0$ we mean that $T$ is positive semi-definite.)
Writing out what that means
$$1-(f(A)^*+1)^{-1}(f(A)-1)^*(f(A)-1)(f(A)+1)^{-1} \geq 0.$$
So,
$$(f(A)^*+1)(f(A)+1)-(f(A)^*-1)(f(A)-1)=2(f(A)+f(A)^*)\geq 0.$$
Results of the above form (positivity of noncommutative rational functions) always have to have "algebraic proofs," many of which can be done algorithmically. See, e. g.,
Helton, Klep, and McCullough - The convex Positivstellensatz in a free algebra and
Pascoe - Positivstellensätze for noncommutative rational expressions.
Best Answer
Another answer is that $M_n$, the space of $n\times n$ complex matrices, carries an operator norm where the norm of a matrix is its norm as a linear operator from $\mathbb{C}^n$ to itself (giving $\mathbb{C}^n$ euclidean norm). For some of us, this is the most natural and useful norm on $M_n$.
With operator norm, $M_n$ is a finite dimensional Banach space, so it has a dual space, which is just $M_n$ equipped with trace norm. In infinite dimensions the trace class operators on $H$, with trace norm, form the (unique) predual of $B(H)$.
Edit: I should add something about how this relates to the $\ell^1$ norm. The operator norm of a diagonal matrix is the $\ell^\infty$ norm of its entries, so operator norm can be seen as a sort of generalization of $\ell^\infty$ norm. Indeed, $M_n$ with operator norm contains an isometric copy of $\ell^\infty_n$ as the diagonal matrices. So it is natural that the dual norm should be the $\ell^1$ norm on the diagonal matrices.