Help with determining the dimensionality of the gradient

vector analysis

When covering vector calculus and dealing with gradients of matrices, is there an intuitive way of thinking about the dimensionality of the gradient?

For example, please see the below image. When considering the dimensionality of $\frac{\partial L}{\partial \theta}$ I thought about it the following way: $L$ is a scalar value given it is the norm of the error. $\theta$ is in $D$ dimensions as specified and as such the output vector for the gradient will be a $1 \times D$ matrix as we take the partial derivative of $L$ wrt each component of $\theta$ (in which there are $D$ of them). Is this method of thinking correct?

Is there any easier way to think of it?

enter image description here

Best Answer

Yes! I see nothing wrong. The loss function is a function of the true function we aim at learning $f^\star$, and the hypothesis $f$: $L = L(f,f^\star)$. For the simple example of linear hypotheses, the hypothesis $f$ can be parametrised by $\theta \in \mathbb{R}^{1 \times D}$. Hence, one can write: $L = L(\theta^T x, f^\star)$ where $x$ denotes the input vector. In a stochastic learning algorithm, one updates the weights using the gradient of the loss function w.r.t. the parameters: $$\theta_{\text{new}} = \theta_{\text{old}} - \alpha \frac{\partial L}{\partial \theta}$$

Since $\theta \in \mathbb{R}^{1 \times D}$, the gradient should have the same dimensionality.

Related Question