Difference between derivative with respect to a slot and a derivative with respect to a variable

A professor of mine was talking about the difference between derivative with respect to a slot and a derivative with respect to a variable. Apparently there is some notational issue there. I didn't really got what he meant there and at the moment I am not able to ask him.

Let $x=(x^1, x^2, \dots, x^d)$. We have notationally speaking

$$
\partial_{x^{j}} f(x) := (\partial_{j} f)(x)
$$

where the left side the derivative in respect to the variable and on the right the derivative with respect to the slot.

Now the argument was that this might lead to confusion in the case of the chain rule, for example:

$$
\partial_{t} f(tx) = \sum_{j=1}^{d} x^{j}(\partial_{j}f)(tx)
$$

(the evaluation here depends on some real $t$ variable so we don't evaluate at $x$ anymore which confuses me somehow) and also that the fundamental theorem of calculus does not apply to slot derivatives.

Could anyone elaborate on that? An example would be awesome to see the difference and what the notational problems are in this cases. Especially the chain rule example I don't understand. Thank you!

Best Answer

Here's how you'd write everything precisely using the chain rule. There are actually two functions involved here. First is $f:\Bbb{R}^n\to\Bbb{R}$ and second is the map $\mu:\Bbb{R}\times\Bbb{R}^n\to\Bbb{R}^n$ defined as \begin{align} \mu(t,x)&:=tx. \end{align} So, $\mu$ is the "scalar-multiplication map". Now, we can consider the composite function $f\circ\mu:\Bbb{R}\times\Bbb{R}^n\to\Bbb{R}$ and ask what is its first partial derivative. The answer is given by the chain rule: for all $(t,x)\in\Bbb{R}\times\Bbb{R}^n$ \begin{align} [\partial_1(f\circ \mu)](t,x)&=\sum_{j=1}^n(\partial_jf)(\mu(t,x))\cdot (\partial_1\mu^j)(t,x)\\ &=\sum_{j=1}^n(\partial_jf)(tx)\cdot x^j, \end{align} where the last line is simply because $\mu^j(t,x)=tx^j$.

Note that here, you shouldn't pay attention to the individual letters $t$ and $x$. They are just arbitrary curved symbols. Sure, it may be tradition to use these letters, but mathematically, one is not obligated to use them. I could just as well write: for all $(\ddot{\smile},@)\in\Bbb{R}\times\Bbb{R}^n$,

\begin{align} [\partial_1(f\circ \mu)](\ddot{\smile},@)&=\sum_{\#=1}^n(\partial_{\#}f)(\ddot{\smile}@)\cdot @^j \end{align}

As this crazy rewriting shows, logically speaking, one should not use notation such as $\partial_{x_j}f$ or $\frac{\partial f}{\partial x^j}$ in place of $\partial_jf$, simply because the symbol $x$ has no inherent mathematical meaning.

As for the remark about the FTC not being applicable, I'm not sure what your Professor meant, because the FTC is certainly applicable (assuming $f$ is $C^1$ for example): for all $x\in\Bbb{R}^n$, \begin{align} f(x)&=f(0)+\int_0^1\sum_{j=1}^n(\partial_jf)(tx)\cdot x^j\,dt \end{align}

Best Answer

Related Solutions

Take the derivative of something with respect to something else

Partial derivative of a function with respect to itself

Related Question