[Math] Prove that Newton’s Method is invariant under invertible linear transformations

affine-geometrynewton raphsonnonlinear optimizationnumerical methodsoptimization

As part of an assignment for a course in nonlinear optimization, I have to prove that Newton's Method is invariant under a non-singular (invertible) linear transformation, $x=Uy$.

Specifically, I was told that if we set $g(y)=f(Uy)$ for a twice-differentiable function $f:\mathbb{R}^{n}\to \mathbb{R}$, then the sequence of points $\{ x^{k}\}$ generated by Newton's Method for $f$ starting from $x^{0}$ with $x^{0}=Uy^{0}$ can be transformed to a sequence of Newton's iterations $\{y^{k}\}$ for the function $g(\cdot)$.

I attempted a proof, and I was wondering if somebody could tell me if it's correct, or if there is anything else that needs to be said. Here's the proof:

Suppose that $U$ is a non-singular linear transformation such that $x=Uy$.

Define $g(y)=f(Uy)$ for a twice continuous differentiable function. Then $$\nabla g(y) = U^{T}\nabla f(Uy), \\ \nabla ^{2}g(y) = U^{T}\nabla^{2}f(Uy)U $$

Next, suppose the starting point for our Newton's Method on $f$ is $x^{0}$, with the updates being $x^{k}$, so that $$x^{0}=Uy^{0} $$

We proceed by induction on $k$.

Suppose the result is true for $x^{k}$ – i.e., that $$x^{k} = Uy^{k}.$$

Then, since $x^{k+1}=x^{k}-\tau_{k}\left[\nabla ^{2}f(x^{k}) \right]^{-1}\nabla f(x^{k})$, we have that $$x^{k+1}=x^{k}-\tau_{k}\left[\nabla ^{2}f(x^{k}) \right]^{-1}\nabla f(x^{k}) \\ = Uy^{k}-\tau_{k}\left[\nabla^{2}f(Uy^{k}) \right]^{-1}\nabla f(Uy^{k}) \\ = Uy^{k}-\tau_{k}UU^{-1}\left[\nabla^{2}f(Uy^{k}) \right]^{-1}\nabla f(Uy^{k}) \\ = Uy^{k}-\tau_{k}UU^{-1}U^{T}(U^{T})^{-1}\left[\nabla^{2}f(Uy^{k}) \right]^{-1}\nabla f(Uy^{k}) \\ =U\left[y^{k}-\tau_{k}\left[U^{T} \nabla^{2}f(Uy^{k})U \right]^{-1} U^{T}\nabla f(Uy^{k})\right] \\ = U\left[y^{k}-\tau_{k}\left[\nabla^{2}g(y^{k}) \right]^{-1}\nabla g(y^{k}) \right] \\ = Uy^{k+1}$$

So, we have established for all nonnegative integers $k$ that $x^{k}=Uy^{k}$, and thus our result follows.

Is this what I was supposed to show here? If not, could somebody please help me improve this?

Thank you

(By the way, $U^{T}$ means the transpose of $U$, and $\tau_{k}$ is the step size at iteration $k$.)

Best Answer

I think it is more reasonable if you put $U^T(U^T)^{−1}$ in the right-hand side of hessian matrix $[∇^2f(Uy^k)]^{−1}$ in the fourth line of your equation

Related Question