Using dot produt or element-wise as multiplication for vectorized multivariables functions in chain rule

chain rulegradient descentjacobianmatrix-calculusmultivariable-calculus

Should we use dot product or Hadamard product (element-wise) for vectorized mutlivariable functions with the chain rule ?

I'm struggling to find the correct operation rule between gradient and Jacobian for the chain rule. I have the following expression :

for $
x = \left[ \begin{matrix}
x_1 \\
x_2 \\
x_3 \\
\end{matrix} \right] \in \mathbf{R}^3
$
, and for $a, b, c \in \mathbf{R}$, $w = \left[ \begin{matrix}
a \\
b \\
c \\
\end{matrix} \right] \in \mathbf{R}^3
$

$$
v_4 = g(w, x) \in \mathbf{R}^3
$$

$$
v_5 = f(w, v_4) \in \mathbf{R}^3
$$

$$
v_6 = L(v_5) \in \mathbf{R}
$$

$$
v_7 = x \times v_6 = x \times L(f(w, g(w, x))) \in \mathbf{R}^3
$$

Where I considered vectorized functions defined as following : for example, if $f: (w \in \mathbf{R}^3, x \in \mathbf{R}) \mapsto a \times_{\mathbf{R}} x + b \times_{\mathbf{R}} x^2 + c$ the vectorized function is $(w \in \mathbf{R}^3, x \in \mathbf{R}^3) \mapsto \left[ \begin{matrix}
f'(x_1) \\
f'(x_2) \\
f'(x_3) \\
\end{matrix} \right]$
with $f'$ being the scalar function. Then, for the differentiation we can consider the use of the Jacobian (which is then a diagonal matrix).

Here :
$$
g(w, x) : \mathbf{R}^3 \times \mathbf{R} \rightarrow \mathbf{R}
$$

$$
f(w, x) : \mathbf{R}^3 \times \mathbf{R} \rightarrow \mathbf{R}
$$

$$
L(x) : \mathbf{R}^3 \rightarrow \mathbf{R}
$$

The chains rule give us :

$$
\frac{\partial{v_7}}{\partial{a}} = x \left( \frac{\partial{L(v_5)}}{\partial{v_5}} \left( \frac{\partial{f(w, v_4)}}{\partial{v_4}} \frac{\partial{g(w, x)}}{\partial{a}} + \frac{\partial{f(w, v_4)}}{\partial{a}} \right) \right)
$$

which in term of objects give us :
$$
\left[ \begin{matrix} \cdot \\ \cdot \\ \cdot \end{matrix} \right] \times_{1} \left( \left[ \begin{matrix} \cdot \\ \cdot \\ \cdot \end{matrix} \right] \times_{2} \left(
\left[ \begin{matrix}
\cdot & 0 & 0 \\
0 & \cdot & 0 \\
0 & 0 & \cdot \\
\end{matrix} \right] \times \left[ \begin{matrix} \cdot \\ \cdot \\ \cdot \end{matrix} \right] + \left[ \begin{matrix} \cdot \\ \cdot \\ \cdot \end{matrix} \right] \right) \right)
$$

The question here is :

Should I be using the dot product for $\times_{2}$ or matrix multiplication by using the transpose of the $\frac{\partial{L(v_5)}}{\partial{v_5}}$ (or using the row column convention for the gradient which may also impact the form of $\frac{\partial{g(w, x)}}{\partial{a}} = \left[ \begin{matrix} \frac{\partial{g_1(w, x_1)}}{\partial{a}} \\ \frac{\partial{g_2(w, x_2)}}{\partial{a}} \\ \frac{\partial{g_3(w, x_1)}}{\partial{a}} \end{matrix} \right]$) or the element-wise product which may result in having $\left[ \begin{matrix} x_0 & x_0 & x_0 \\ x_1 & x_1 & x_1 \\ x_2 & x_2 & x_2 \end{matrix} \right]$ instead of $x \times_{1} …$ , by defining for any constant $e \in \mathbf{R}$ and vector $v \in \mathbf{R}^3$, $v \times e = v \times \left[ \begin{matrix} e \\ e \\ e \end{matrix} \right]$ ?

Best Answer

As pointed out in the comments by @greg we need to follow the same convention for the gradient as the Jacobian which is by default the row vector (because it's a line of the Jacobian)

Then, $\frac{\partial{L(v_5)}}{\partial{v_5}} = \left[\begin{matrix} \cdot & \cdot & \cdot \end{matrix}\right] \in \mathbf{M}^{(1, 3)}$ is a row matrix. Thus $\times_{1}$ is a multiplication between a column matrix and a row matrix, which give us a matrix :

$$ x \times_{1} \frac{\partial{L(v_5)}}{\partial{v_5}} = \left[\begin{matrix} \cdot \\ \cdot \\ \cdot \end{matrix}\right] \left[\begin{matrix} \cdot & \cdot & \cdot \end{matrix}\right] = \left[ \begin{matrix} \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \end{matrix}\right] \in \mathbf{M}^{(1, 3)} $$

The chain rule could then be expanded (for example, if needed for a computation graph) as follows :

$$ \frac{\partial{v_7}}{\partial{a}} = \left( \left( x \frac{\partial{L(v_5)}}{\partial{v_5}} \right) \frac{\partial{f(w, v_4)}}{\partial{v_4}} \right) \frac{\partial{g(w, x)}}{\partial{a}} + \left( x \frac{\partial{L(v_5)}}{\partial{v_5}} \right)\frac{\partial{f(w, v_4)}}{\partial{a}} \in \mathbf{R}^3 $$

Related Question