Neural Networks – Derivatives of Output w.r.t Input in Standardized Data Trained Neural Networks

I'm using a neural network to model an unknown function for which I would also like to know the derivatives. The nn has four inputs and four outputs, and the training data is preprocessed using scikit-learn StandardScaler, i.e.,

$$
\hat{\bf{x}} = (\bf{x} – \mu_{x})/\sigma_{x} \quad \text{and} \quad \hat{\bf{y}} = (\bf{y} – \mu_{y})/\sigma_{y}
$$

As a result, I get the standardized data, means, and stds for each variable.

$$
\mu_{x_{1}},\mu_{x_{2}},\mu_{x_{3}},\mu_{x_{4}} \quad \text{and} \quad \mu_{y_{1}},\mu_{y_{2}},\mu_{y_{3}},\mu_{y_{4}}
$$
and
$$
\sigma_{x_{1}},\sigma_{x_{2}},\sigma_{x_{3}},\sigma_{x_{4}} \quad \text{and} \quad \sigma_{y_{1}},\sigma_{y_{2}},\sigma_{y_{3}},\sigma_{y_{4}}
$$

After training, I can use $\mu$ and $\sigma$ to transform the output back to the original scale of the data.

Now, I want to compute the derivatives of the nn output w.r.t the inputs. However, these also seem to be standardized.

$$
\frac{\partial y_{1}}{\partial x_{1}},\frac{\partial y_{1}}{\partial x_{2}},\cdots,\frac{\partial y_{4}}{\partial x_{3}},\frac{\partial y_{4}}{\partial x_{4}}
$$

How do I transform (unscale) the derivatives to the original data scale?

Thanks!

Best Answer

The derivatives you have are $\frac{\partial y_i}{\partial \hat x_j}$, and you have $\hat x_j=(x_j-\mu_j)/\sigma_j$. So, you just apply the chain rule for the derivative, meaning $$\frac{\partial y_i}{\partial x_j}=\frac{\partial y_i}{\partial \hat x_j}\frac{\partial \hat x_j}{\partial x_j}=\frac{\partial y_i}{\partial \hat x_j}\frac{1}{\sigma_j}$$ and if the training data for the outputs is also standardized, you can obtain the expression $$ y_i=\hat y_i\sigma_i + \mu_i $$ and apply the chain rule twice to get $$\frac{\partial y_i}{\partial x_j}=\frac{\partial \hat y_i}{\partial \hat x_j}\frac{\partial \hat x_j}{\partial x_j}\frac{\partial y_i}{\partial \hat y_i}=\frac{\partial y_i}{\partial \hat x_j}\frac{\sigma_i}{\sigma_j}$$ However, I'm not sure if this derivative is of any use to you.

Best Answer

Related Solutions

Solved – How to get prediction from a neural network since its output is standardized

Solved – Why take the gradient of the moments (mean and variance) when using Batch Normalization in a Neural Network

Related Question