Proving that Randers norm is a Minkowski norm

differential-geometryfinsler-geometrymetric-spacesnormed-spaces

I am struggling to follow the proof that the Randers norm is a Minkowski norm from "Lectures on Finsler Geometry" by Zhongmin Shen.

A Minkowski norm on finite dimensional vector space $V$ is a function $F:V\to[0,\infty)$ which has the following properties:

  1. $F$ is $C^{\infty}$ on $V\setminus\{0\}$
  2. $F(\gamma y) = \gamma F(y)$, for all $\gamma>0$ and $y\in V$
  3. For any $y\in V\setminus \{0\}$, the symmetric bilinear form $g_y$ on $V$ is positive definite, where
    $$\mathbf{g}_y (u,v):= \frac{1}{2}\frac{\partial^2}{\partial s \partial t}\bigg[ F^2(y+su+tv)\bigg] \biggr\rvert_{s=t=0}$$

A Randers norm on $V$ is defined as $R(y):=\alpha(y)+\beta(y)$ where $\alpha$ is the Euclidean norm and $\beta$ is a linear form.

I am confused about how to show that the Randers norm satisfies the Minkowski norm's third condition. I think my confusion stems from unfamiliarity with Einstein tensor notation.

The text's proof proceeds by fixing a basis $\{\mathbf{b}_i\}_{i=1}^n$ for $V$ and computing
$$g_{ij}:= \mathbf{g}_y (\mathbf{b}_i\mathbf{b}_j)=\frac{1}{2}[F^2]_{y^{i}y^{j}}(y)$$
to be
$$g_{ij} = \frac{F}{\alpha}\left(a_{ij}-\frac{y_i}{\alpha}\frac{y_j}{\alpha}\right)+\left(\frac{y_i}{\alpha}+b_i\right)\left(\frac{y_i}{\alpha}+b_j\right)$$

Any attempt to walk through/clarify this step is appreciated. Also, insight into why $\Vert\beta\Vert<1$ ensures the above is positive definite is desired.

I gathered from context that when the index is in the upper position, we have a row vector instead of a column vector, but I'm missing a lot on how to actually perform the computations.

Best Answer

$\def\b{\boldsymbol{b}}\def\d{\mathrm{d}}\def\F{\widehat{F}}\def\f{\widehat{f}}\def\R{\mathbb{R}}\def\T{^{\mathrm{T}}}\def\y{\widetilde{y}}\def\α{\widehat{α}}\def\β{\widehat{β}}\def\eval#1{\left.#1\right|}\def\paren#1{\left(#1\vphantom{\Big|}\right)}$For $1 \leqslant k\leqslant n$, denote $e_k = (0, \cdots, 0, 1, 0, \cdots, 0)$ where the $k$-th component is $1$. For any $f: V → \R$, define $\f: \R^n → \R$ as $\f(x) = f\paren{ \sum\limits_{k = 1}^n x_k \b_k }$, i.e. $\f$ is the coordinate representation of $f$ with respect to the basis $\b_1, \cdots, \b_n$. For general $F: V → [0, +∞)$,\begin{align*} g_{i, j} &= \frac{1}{2} \eval{ \frac{\partial^2}{\partial s\partial t}\paren{ (F(\boldsymbol{y} + s\b_i + t\b_j))^2 } }_{(s, t) = (0, 0)} = \frac{1}{2} \eval{ \frac{\partial^2}{\partial s\partial t}\paren{ (\F(y + se_i + te_j))^2 } }_{(s, t) = (0, 0)}\\ &= \eval{ \frac{\partial}{\partial s}\paren{ \F(y + se_i + te_j) J\F(y + se_i + te_j) e_j } }_{(s, t) = (0, 0)}\\ &= \eval{ \frac{\d}{\d s}\paren{ \F(y + se_i) J\F(y + se_i) e_j } }_{s = 0}\\ &= \eval{ \paren{ (J\F(y + se_i) e_i) (J\F(y + se_i) e_j) + \F(y + se_i) (e_i\T H\F(y + se_i) e_j) } }_{s = 0}\\ &= (J\F(y) e_i) (J\F(y) e_j) + \F(y) (e_i\T H\F(y) e_j), \end{align*} where $J\F$ and $H\F$ are the Jacobian and Hessian of $\F$, respectively.

Now as assumed in the textbook, $\α(y) = \sqrt{y\T Ay}$ and $\β(y) = b\T y$, where $A = (a_{i, j}) > 0$ and $b = (b_1, \cdots, b_n)\T$. Because\begin{gather*} J\α(y) = J\paren{ \sqrt{y\T Ay} } = \frac{J(y\T Ay)}{2\sqrt{y\T Ay}} = \frac{y\T A}{\sqrt{y\T Ay}} = \frac{y\T A}{\α(y)},\\ H\α(y) = J((J\α(y))\T) = J\paren{ \frac{Ay}{\α(y)} }\\ = \frac{J(Ay) \α(y) - Ay J(\α(y))}{(\α(y))^2} = \frac{A}{\α(y)} - \frac{Ay y\T A}{(\α(y))^3},\\ J\β(y) = b\T,\quad H\β(y) = 0, \end{gather*} so$$ J\F(y) = \frac{y\T A}{\α(y)} + b\T,\quad H\F(y) = \frac{A}{\α(y)} - \frac{Ay y\T A}{(\α(y))^3}, $$ and\begin{align*} g_{i, j} &= (J\F(y) e_i) (J\F(y) e_j) + \F(y) (e_i\T H\F(y) e_j)\\ &= \paren{ \paren{ \frac{y\T A}{\α(y)} + b\T } e_i } \paren{ \paren{ \frac{y\T A}{\α(y)} + b\T } e_j } + \F(y) \paren{ e_i\T \paren{ \frac{A}{\α(y)} - \frac{Ay y\T A}{(\α(y))^3} } e_j }\\ &= \paren{ \frac{y\T Ae_i}{\α(y)} + b_i } \paren{ \frac{y\T Ae_j}{\α(y)} + b_j } + \frac{\F(y)}{\α(y)} \paren{ a_{i, j} - \frac{\color{red}{(e_i\T Ay) (y\T Ae_j)}}{(\α(y))^2} }\\ &= \paren{ \frac{\color{blue}{e_i\T Ay}}{\α(y)} + b_i } \paren{ \frac{\color{blue}{e_j\T Ay}}{\α(y)} + b_j } + \frac{\F(y)}{\α(y)} \paren{ a_{i, j} - \frac{(e_i\T Ay) (\color{blue}{e_j\T Ay})}{(\α(y))^2} }, \end{align*} where red and blue terms are due to associativity and $u\T v = v\T u$, respectively. Note that$$ \y_i := e_i\T Ay = \sum_{j = 1}^n a_{i, j} y_j, $$ thus\begin{gather*} g_{i, j} = \frac{F}{α} \paren{ a_{i, j} - \frac{\y_i \y_j}{α^2} } + \paren{ \frac{\y_i}{α} + b_i } \paren{ \frac{\y_j}{α} + b_j }.\tag{1} \end{gather*}


Now denote $G = (g_{i, j})$. Since $T = α + β$ and$$ \y := \begin{pmatrix} \y_1 \\ \vdots \\ \y_n \end{pmatrix} = \begin{pmatrix} e_1\T Ay \\ \vdots \\ e_n\T Ay \end{pmatrix} = \begin{pmatrix} e_1\T \\ \vdots \\ e_n\T \end{pmatrix} Ay = Ay, $$ writing (1) in matrix form yields\begin{align*} G &= \frac{F}{α} \paren{ A - \frac{1}{α^2} \y \y\T } + \paren{ \frac{1}{α^2} \y \y\T + \frac{1}{α} (\y b\T + b \y\T) + bb\T }\\ &= \frac{α + β}{α} \paren{ A - \frac{1}{α^2} Ay y\T A } + \paren{ \frac{1}{α^2} Ay y\T A + \frac{1}{α} (Ay b\T + b y\T A) + bb\T }. \end{align*} Thus for $x \in \R^n$,\begin{align*} x\T Gx &= {\small \frac{α + β}{α} \paren{ x\T Ax - \frac{1}{α^2} \color{red}{(x\T Ay) (y\T Ax)} }}\\ &\mathrel{\phantom=} {\small + \paren{ \frac{1}{α^2} \color{red}{(x\T Ay) (y\T Ax)} + \frac{1}{α} (\color{red}{(x\T Ay) (b\T x)} + \color{red}{(x\T b) (y\T Ax)}) + \color{red}{(x\T b)(b\T x)} }}\\ &= {\small \frac{α + β}{α} \paren{ x\T Ax - \frac{1}{α^2} \color{blue}{(y\T Ax)^2} } + \paren{ \frac{1}{α^2} \color{blue}{(y\T Ax)^2} + \frac{2}{α} \color{blue}{(y\T Ax) (b\T x)} + \color{blue}{(b\T x)^2} }}\\ &= \paren{ 1 + \frac{β}{α} } \paren{ x\T Ax - \frac{1}{α^2} (y\T Ax)^2 } + \paren{ \frac{1}{α} y\T Ax + b\T x }^2. \tag{2} \end{align*}

Note that $\|β\| = \|\β\| = \sup\limits _{\α(y) = 1} \β(y)$. The homogeneity of $\α$ and $\β$ implies that $\|\β\| = \sup\limits _{y ≠ 0} \dfrac{\β(y)}{\α(y)}$, so\begin{gather*} -\|\β\| \α(y) \leqslant \β(y) \leqslant \|\β\| \α(y).\quad \forall y \in \R^n \setminus \{0\} \tag{3} \end{gather*} Also, the compactness of $\{y \in \R^n \mid \α(y) = 1\}$ implies that $\|\β\| = \max\limits _{\α(y) = 1} \β(y)$, so there exists $y^* \in \R^n \setminus \{0\}$ with $\β(y^*) = \|\β\| \α(y^*)$, and $\β(-y^*) = -\β(y^*) = -\|\β\| \α(y^*)$. Hence either equality in (3) can be attained.

On the one hand, if $\|β\| < 1$, then $1 + \dfrac{β}{α} = 1 + \dfrac{\β(y)}{\α(y)} \geqslant 1 - \|\β\| > 0$ in (2). Cauchy's inequality shows that$$ (y\T Ax)^2 \leqslant (y\T Ay)(x\T Ax) = α^2 x\T Ax, $$ thus$$ (2) \stackrel{(*)}{\geqslant} \frac{α + β}{α} \paren{ x\T Ax - \frac{1}{α^2} (y\T Ax)^2 } \stackrel{(**)}{\geqslant} 0. $$ If both (*) and (**) are equalities, then $\dfrac{1}{α} y\T Ax + b\T x = 0$ and $x = cy$ for some $c \in \R$ (since $y ≠ 0$), which implies that $\dfrac{1}{α} y\T Ay + b\T y = 0$, i.e.$$ \β(y) = b\T y = -\frac{1}{α} y\T Ay = -\α(y), $$ contradictory to (3). Therefore $G > 0$.

On the other hand, suppose $y^* \in \R^n \setminus \{0\}$ satisfies $\β(y^*) = -\|\β\| \α(y^*)$. For $n \geqslant 3$, there exists $x^* \in \R^n \setminus \{0\}$ with$$ \begin{pmatrix} \dfrac{1}{\α(y^*)} (y^*)\T A \\ b\T \end{pmatrix} x^* = 0, $$ i.e. $\dfrac{1}{\α(y^*)} (y^*)\T Ax = b\T x = 0$. Plugging into (1) yields$$ (x^*)\T Gx^* = \paren{ 1 + \frac{\β(y^*)}{\α(y^*)} } (x^*)\T Ax^* = (1 - \|\β\|) (x^*)\T Ax^*. $$ Since $(x^*)\T Ax^* > 0$ and $(x^*)\T Gx^* > 0$, then $\|β\| < 1$.