To me continuity is more geometric and intuitive than the rest of the argument (which is purely algebraic manipulation). So I take the liberty to mis-read you question as follows:
- Is it possible to derive linearity of the inner product from the parallelogram law using only algebraic manipulations?
By "only algebraic" I mean that you are not allowed to use inequalities. (It is triangle inequality that allows one to use continuity. In fact, one can derive continuity using only the inequality $|u|^2\ge 0$ and the parallelogram law.) Also, an algebraic argument must work over any field on characteristic 0.
The answer is that it is not possible. More precisely, the following theorem holds.
Theorem. There exists a field $F\subset\mathbb R$ and a function $\langle\cdot,\cdot\rangle: F^2\times F^2\to F$ which is symmetric, additive in each argument (i.e. $\langle u,v+w\rangle=\langle u,v\rangle+\langle u,w\rangle$), satisfies the identity $\langle tu,tv\rangle = t^2\langle u,v\rangle$ for every $t\in F$, but is not bi-linear.
Note that the above assumptions imply that the "quadratic form" $Q$ defined by $Q(v)=\langle v,v\rangle$ satisfies $Q(tv)=t^2Q(v)$ and the parallelogram identity, and the "product" $\langle\cdot,\cdot\rangle$ is determined by $Q$ in the usual way. [EDIT: an example exists for $F=\mathbb R$ as well, see Update.]
Proof of the theorem. Let $F=\mathbb Q(\pi)$. An element $x\in F$ is uniquely represented as $f_x(\pi)$ where $f_x$ is a rational function over $\mathbb Q$. Define a map $D:F\to F$ by $D(x) = (f_x)'(\pi)$. This map satisfies
Define $P:F\times F$ by $P(x,y) = xD(y)-yD(x)$. From the above identities it is easy to see that $P$ is additive in each argument and satisfies $P(tx,ty)=t^2 P(x,y)$ for all $x,y,t\in F$. Finally, define a "scalar product" on $F^2$ by
$$
\langle (x_1,y_1), (x_2,y_2) \rangle = P(x_1,y_2) + P(x_2,y_1) .
$$
It satisfies all the desired properties but is not bilinear: if $u=(1,0)$ and $v=(0,1)$, then $\langle u,v\rangle=0$ but $\langle u,\pi v\rangle=1$.
Update. One can check that if $\langle\cdot,\cdot\rangle$ is a "mock scalar product" as in the theorem, then for any two vectors $u,v$, the map $t\mapsto \langle u,tv\rangle - t\langle u,v\rangle$ must be a differentiation of the base field. (A differentiation is map $D:F\to F$ satisfying the above rules for sums and products.) Thus mock scalar products on $\mathbb R^2$ are actually classified by differentiations of $\mathbb R$.
And non-trivial differentiations of $\mathbb R$ do exist. In fact, a differentiation can be extended from a subfield to any ambient field (of characteristic 0). Indeed, by Zorn's Lemma it suffices to extend a differentiation $D$ from a field $F$ to a one-step extension $F(\alpha)$ of $F$. If $\alpha$ is transcedental over $F$, one can define $D(\alpha)$ arbitrarily and extend $D$ to $F(\alpha)$ by rules of differentiation. And if $\alpha$ is algebraic, differentiating the identity $p(\alpha)=0$, where $p$ is a minimal polynomial for $\alpha$, yields a uniquely defined value $D(\alpha)\in F(\alpha)$, and then $D$ extends to $F(\alpha)$. The extensions are consistent because all identities involved can be realized in the field of differentiable functions on $\mathbb R$, where differentiation rules are consistent.
Thus there exists a mock scalar product on $\mathbb R^2$ such that $\langle e_1,e_2\rangle=0$ but $\langle e_1,\pi e_2\rangle=1$. And I am sure I reinvented the wheel here - all this should be well-known to algebraists.
Best Answer
One simple example is as follows. Consider the space $L^2(\{-1;1\}^n)$ of functions on the discrete hypercube $\{-1;1\}^n$ (with uniform measure), and the Dirichlet quadratic form $$\langle f,g\rangle_\nabla:=\sum_{i=1}^n\mathbb{E}\left(\nabla_if\nabla_ig\right),$$ where $$\nabla_if(x_1,\dots,x_n)=f(x_1,\dots,-x_i,\dots,x_n)-f(x_1,\dots,x_i,\dots,x_n).$$ Using expansion in the Fourier-Welsh basis $\chi_\omega((x_1,\dots,x_n))=\prod_{i\in\omega}x_i,$ where $\omega$ runs over subsets of $\{1,\dots,n\}$, it is easy to prove Poincaré's inequality: $$ \mathbb{E}f^2-(\mathbb{E}f)^2\leq \langle f,f\rangle_\nabla. $$ This allows to estimate the variance of an arbitrary random variable $f$ if one has bounds on influences of individual variables $\mathbb{E}\left(\nabla_if\right)^2$. A typical application: consider a random metric on $\mathbb{Z}^2$ by declaring every lattice edge to have length $a>0$ or $b>a>0$, with probability $1/2$ independently of each other (a first passage percolation model). Consider the distance between (0,0) and (0,k). Flipping the variable corresponding to one edge can increase the distance only if this edge is in the shortest path, and in any configuration there are at most $bk/a$ such edges. Therefore, the variance of the distance grows at most linearly in $k$. (By much less elementary methods, Benjamini, Kalai and Schramm improved this by estimate by a $\log k$ factor. The correct estimate is believed to be $k^{2/3}$)