Help with the gradient in different co-ordinate systems

multivariable-calculustensorsvector analysis

Let $L(x,y)$ be the linear Taylor series expansion of some function $f(x,y)$. This can be written as $$L(x,y)=f(x_0,y_0)+f_x(x-x_0)+f_y(y-y_0)$$ Or in more compact form as $$L(x,y)=f(x_0,y_0)+\nabla f ^T \vec{p}$$
where
$$
\vec{p}=
\begin{bmatrix}
x-x_0\\
y-y_0 \\
\end{bmatrix}
$$

and $$\nabla f(x,y)=f_x \vec{i}+f_y\vec{j}$$
Now, we find out that $x'= \alpha_1 x$ and $y'= \alpha_2 y$, which is the linear transformation $A$ of the original co-ordinates and we want to find the Taylor series expansion in terms of $x' \text{ and } y'$, i.e. the new co-ordinate system. Writing the composite function $$g(x',y')=f(x(x'),y(x'))$$
$$L(x',y')=g(x'_0,y'_0)+\frac{\partial f}{\partial x}\frac{\partial x}{\partial x'}(x'-x'_0)+\frac{\partial f}{\partial y}\frac{\partial y}{\partial y'}(y'-y'_0)$$
$\implies$.
$$L(x',y')=f(x_0,y_0)+\frac{1}{\alpha_1}\frac{\partial f}{\partial x}(x'-x'_0)+\frac{1}{\alpha_2}\frac{\partial f}{\partial y}(y'-y'_0)$$
$\implies$
$$L(x',y')=f(x_0,y_0)+\vec{p}'^TA^{-T}\nabla f(x_0,y_0)$$
This function is describing exactly the same function but in the new stretched Cartesian co-ordinate system. By inspection the gradient in terms of the new co-ordinate system is $$\nabla g(x_0',y_0') = A^{-T}\nabla f(x_0,y_0)=g_{x'} \vec{i}'+g_{y'} \vec{j}'= \frac{1}{\alpha_1}f_x \vec{i}' + \frac{1}{\alpha_2}f_y \vec{j}'$$
Now because the Cartesian co-ordinate system has been stretched the basis vectors have changed, such that now $$\vec{i}'=\frac{\vec{i}}{\alpha_1}$$ and $$\vec{j}'=\frac{\vec{j}}{\alpha_2}$$
Substitution of these into $\nabla g(x_0',y_0')$ reveals that $$\nabla g(x_0',y_0')= \frac{1}{\alpha_1^2}f_x \vec{i} + \frac{1}{\alpha_2^2}f_y \vec{j}$$ which is different to the direction in the standard Cartesian co-ordinate system. On my travels to understand this I discovered Pavel Grinfeld's video series on Tensor algebra, where he shows that to make the gradient invariant under transformation I should use the definition $$\nabla f(x,y)=\frac{f_x}{\vec{i}\cdot\vec{i}}\vec{i}+\frac{f_x}{\vec{j}\cdot\vec{j}}\vec{j}$$
My background is in EEE, and frequently use steepest decent to solve numeric problems, I had never considered the transformation until I looked into conditioning the problem. I'm having a hard time understanding why the gradient derived from the transformed system is different to the original one in the first place. If anyone can shed any light on using the basis vectors $\frac{\vec{i}}{\vec{i}\cdot\vec{i}}$ and $\frac{\vec{j}}{\vec{j}\cdot\vec{j}}$ I would be very grateful. There seems to be a contradiction in here, or a subtle detail I a missing. I have seen several posts of people just saying the gradient is variant under transformation and to use the direction in the transformed co-ordinate system. My issue is, there shouldn't be any variation in the gradient, as per the geometric definition, it points in the direction of greatest increase and this doesn't change simply because co-ordinates are transformed.


Example
$$f(x,y)=2x+2y$$
$$\nabla f(x,y)=2\vec{i}+2\vec{j}$$
let $x'=2x$ and $y'=2y$
$$g(x',y')=x'+y'$$
$$\nabla g(x',y')=1\vec{i'}+1\vec{j'}$$
$\vec{i}=2 \vec{i}'$ and $\vec{j}=2 \vec{j}'$
and so this gradient in terms of the old basis vectors but in the new co-ordinate system is
$$\nabla g(x',y')=0.5\vec{i}+0.5\vec{j}$$

so $\nabla g(x',y') \neq \nabla f(x,x) $ but $g(x',y') = f(x(x'),y(y')) $

[Gradient in two co-ordinate systems][1]

I don't understand how the following would help, can anyone help me consolidate this?
$$\nabla g(x',y')=\frac{1\vec{i'}}{\vec{i'} \cdot \vec{i'}}+\frac{1\vec{j'}}{\vec{j'} \cdot \vec{j'}}$$
[1]: https://i.sstatic.net/PArGk.png


EDIT:

Just to add, I found this thread on the same subject Definition of the gradient for non-Cartesian coordinates. They define the gradient geometrically as the directional derivative in the unit direction of steepest ascent, dotted with the unit direction of steepest ascent. That is: $$\nabla f = d(\vec{v}_{max})\vec{v}_{max}$$ Why am I getting two different gradients in two different co-ordinate systems? Why do these: http://www.stat.cmu.edu/~ryantibs/convexopt-S15/scribes/14-newton-scribed.pdf lecture notes describe the gradient decent method as affine variant? I understand that the sensitivities will be different because different co-ordinate systems are used, but the basis vectors should compensate for this such that the gradient points in the same direction regardless of the system used.

Best Answer

The discrepancy in your argument:

Your mistake was writing $\nabla g = g_{x'} i' + g_{y'} j'$. It is not true that the gradient has components $\left( \frac{\partial f}{\partial u}, \; \frac{\partial f}{\partial v} \right)$ in every coordinate system $(u,v)$. The gradient is only defined this way for the standard $(x,y)$ coordinates.

The differential:

On the other hand, the more well-behaved object is the differential $df$ (which is a $1$-form, not a vector field): $$ df = \frac{\partial f}{\partial x} \, dx + \frac{\partial f}{\partial y} \, dy $$ We can change to new coordinates $(u,v)$, and replace $dx = x_u du + x_v dv$, and similarly for $y$: $$ df = (f_x x_u + f_y y_u) \, du + (f_x x_v + f_y y_v) \, dv $$ By the chain rule, $f_u = f_x x_u + f_y y_u$, so actually this is the same as $ df = f_u \, du + f_v \, dv$. And so $df$ does look the same in every coordinate system (unlike the gradient).

What is the relation?

Differential $1$-forms "eat" vectors: for a vector $v = (a,b)$, in ordinary $(x,y)$ coordinates, $$ df(v) = a f_x + b f_y $$ In more abstract differential geometry and Riemannian geometry, the gradient is defined as the vector $\nabla f$ with the property that for any vector $v$: $$ df(v) = \left< \nabla f, \; v \right> $$ where $\left<-,-\right>$ is the usual dot product. In usual $(x,y)$ coordinates, the dot product of vectors looks like $$ \left< (a,b), (c,d) \right> = ac + bd $$ The discrepancy that you are seeing is because the dot product does not look like this in every coordinate system. In a generic coordinate system $(u,v)$, the dot product of vectors (written in the $i', j'$ basis) looks like $$ \left< (a,b), \, (c,d)\right> = (x_u^2+y_u^2) ac + (x_ux_v +y_uy_v)(bc + ad) + (x_v^2+y_v^2)bd $$ So although $df$ always looks the same, $\nabla f$ will not.

Your Example:

In your example, where $u= x' = \alpha_1 x$ and $v = y' = \alpha_2 y$, the dot product of vectors in the $(x',y')$ coordinates (in your notation, using the $i',j'$ basis) looks like $$ \left< (a,b), \, (c,d) \right> = \frac{ac}{\alpha_1^2} + \frac{bd}{\alpha_2^2} $$ So in these $(x',y')$ coordinates, the gradient $\nabla f = (p,q)$ is defined by the equation $df(v) = \left< \nabla f, \, v \right>$, which translates to $$ a f_{x'} + b f_{y'} = \frac{ap}{\alpha_1^2} + \frac{bq}{\alpha_2^2} $$ Plug in $v=(1,0)$ and $v=(0,1)$ to quickly see that $$ \nabla f = (\alpha_1^2 f_{x'}, \; \alpha_2^2 f_{y'}) $$