Finding the gradient of the restricted function in terms of the gradient of the original function

derivativesmultivariable-calculuspartial derivativescalar-fields

The following question showed up as part of a proof that I am doing for my research thesis.

If we have a differentiable function $f: \mathbb{R}^n \to \mathbb{R}$ and then set $n-d$ coordinates to zero we get a new differentiable function $g: \mathbb{R}^d \to \mathbb{R}$. Now, given the gradient $\nabla_x f(x)$, how one can get $\nabla_y g(y)$?

My try

Let $x \in \mathbb{R}^n$ and $S \subset \{1,\dots,n\}$ such that $|S|=d$ where $|\cdot|$ is the cardinality of the set. Let $U_S$ be a restricted identity matrix such that the $j$-th entry of the diagonal matrix is maintained if $j \in S$ otherwise it is set to zero. Also, let $I_S$ be the restriction of $U_S$ where we keep nonzero columns and remove zero columns. Hence,

$$
g(y)=f(U_Sx)
$$
where $y=I_S^{\top}x$.

The above is the translation of what I stated in terms of functions $f$ and $g$.

From this point things are a little bit unclear. I think the answer should be $\nabla_y g(y)=I_S^{\top} \nabla_x f(x)$ but I do not know how to get it.

Also, I know using the chain rule $J_x f(U_S x)=J_{W} f(W)J_x W= J_{W} f(W)U_S$ where $J$ is the Jacobian and $W=U_S x$. In addition, $\nabla^{\top}_x f(U_Sx) = J_x f(U_S x)=J_{W} f(W)U_S$. I do not know how to put things together.

Best Answer

Since no one has posted an answer yet, and I get the same result as you suggest, I thought I'll post my solution for you to judge:

We have that $$U_S x = I_S y$$ so that, vieweing matrices as linear transformations $$g(y) = f(U_S x) = f(I_S y) = f\circ I_S (y)$$ And similar to what you write about $J_{U_S}(x)$ we have $J_{I_S}(y) = I_S$. Applying the chain rule: $J_{h_1 \circ h_2}(a) = J_{h_1}(h_2(a))J_{h_2}(a)$ then gives $$\begin{align} (\nabla_y g(y))^T = J_g(y) =\\ J_{f\circ I_S}(y) = \\ J_f(I_s y)J_{I_S}(y) = \\ J_f(U_s x)I_S = \\ (\nabla_x f(U_S x))^TI_S \implies \\ \nabla_y g(y) = [(\nabla_x f(U_S x))^TI_S]^T = I_S^T\nabla_x f(U_S x) \end{align} $$

Due to the definitions of $U_S$ and $I_S$, the zero columns in $I_S^T$ exactly matches the rows where $\nabla_x f(U_S x)$ and $\nabla_x f(x)$ might differ, so finally we obtain $$ \nabla_y g(y) = I_S^T\nabla_x f(U_S x) = I_S^T \nabla_x f(x) $$

Edit:
As pointed out in the comments, it would be more correct to write $$ \nabla_y g(y) = I_S^T\nabla_x f(I_S y) $$

Best Answer

Related Solutions

Real Analysis – Gradient as Row vs Column Vector

How Jacobian is defined for the function of a matrix

Related Question