[Math] Gradient of dot product of two vectors

multivariable-calculus

I am taking a class in which knowledge of gradients is a prerequisite. I am familiar with gradients but don't have too much experience, so I am having trouble understanding the following example.

$\theta, x \in \mathbb R^d$.

Define $\nabla J(\theta) = \begin{pmatrix} \frac{\partial}{\partial \theta_1} J(\theta) \\ \frac{\partial}{\partial \theta_2} J(\theta) \\ \vdots \\ \frac{\partial}{\partial \theta_d} J(\theta) \end{pmatrix}$ .

Then, I'm having trouble understanding why the following is true:
$\nabla (\theta \cdot x) = \nabla \left( \sum_{i=1}^{d} \theta_i x_i \right) = \begin{pmatrix} \frac{\partial}{\partial \theta_1} \theta \cdot x \\ \frac{\partial}{\partial \theta_2} \theta \cdot x \\ \vdots \\ \frac{\partial}{\partial \theta_d} \theta \cdot x \end{pmatrix} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_d \end{pmatrix} = x$

My two questions are:

  • Why is the third equality true? That is, why is $\frac{\partial}{\partial \theta_i} \cdot x = x_i$?
  • The first equality expresses the result as the gradient of a scalar: $\nabla \left( \sum_{i=1}^{d} \theta_i x_i \right)$. How is it possible this is equal to a vector $x$ in the last equality?

Best Answer

There are a few alternative ways to think of it, which may make it simpler to understand. One which is quite similar to the above calculation is$$\nabla(\theta\cdot x)=\nabla\left(\sum\theta_ix_i\right)=\left(\begin{array}{c}\frac{\partial}{\partial\theta_1}\sum\theta_ix_i\\\vdots\\\frac{\partial}{\partial\theta_d}\sum\theta_ix_i\end{array}\right)=\left(\begin{array}{c}x_1\\\vdots\\x_d\end{array}\right)=x.$$Regarding the second question - the vector $x$ is constant, and the product $\theta\cdot x$ is a function of $\theta$. The gradient is a machine that eats a function and returns a vector, as is the case here.

Related Question