Transpose of a partial derivative

linear algebrapartial derivative

I would really appreciate if you can help with the following equations.

My prof puts the following equation, followed by the equation below.

$$g(\dot{\mathbf{q}})=\frac{1}{2} \dot{\mathbf{q}}^{\top} W \dot{\mathbf{q}}+\lambda^{\top}(\dot{\mathrm{x}}-J \dot{\mathbf{q}})$$

$$\left(\frac{\partial g}{\partial \dot{\mathbf{q}}}\right)^{\top}=W \dot{\mathbf{q}}+J^{\top} \boldsymbol{\lambda}=\mathbf{0}$$

I know that we are taking the transpose of the partial derivative. Shouldn't the answer be $$\left(\frac{\partial g}{\partial \dot{\mathbf{q}}}\right)^{\top} = (W\dot{\mathbf{q}} – \lambda^TJ)^T.$$

I would really appreciate if anyone can guide me on this.

Thanks in advance.

Best Answer

Notation $\frac{\partial f(\mathbf x)}{\partial \mathbf x}$ stands for a vector with components $\left(\frac{\partial f(\mathbf x)}{\partial x_1}, \dots, \frac{\partial f(\mathbf x)}{\partial x_n}\right)$. It is convenient to treat it like a row vector (think of it like a Jacobian matrix of 1-component vector function).

Let's see how $\frac{\partial}{\partial \mathbf x}$ acts on simple functions. $$ \frac{\partial}{\partial \mathbf x} \left(\mathbf a^\top \mathbf x\right) = \left( \frac{\sum_i a_i x_i}{\partial x_1}, \dots, \frac{\sum_i a_i x_i}{\partial x_n} \right) = (a_1, \dots, a_n) = \mathbf a^\top. $$ Here $\mathbf a$ is some constant column vector. The answer is $\mathbf a^\top$ because the answer should be a row vector.

Note that $\mathbf a^\top \mathbf x$ is the same as $\mathbf x^\top \mathbf a$ (assuming that we're dealing with real-valued vectors). Thus $$ \frac{\partial}{\partial \mathbf x} \left(\mathbf x^\top \mathbf a\right) = \mathbf a^\top. $$

The same holds when $\mathbf x$ is multiplied with matrix: $$\frac{\partial}{\partial \mathbf x} \left(A \mathbf x\right) = A$$ $$\frac{\partial}{\partial \mathbf x} \left(\mathbf x^\top B\right) = B^\top$$

The rule of thumb is $$ \frac{\partial}{\partial \mathbf x} \left(\text{something} \cdot \mathbf x\right) = \text{something}\\ \frac{\partial}{\partial \mathbf x} \left(\mathbf x^\top \cdot \text{something}\right) = (\text{something})^\top\\ $$ Here is assumed that "$\text{something}$" does not depend on $\mathbf x$.

Let's now differentiate your function $$ g(\dot{\mathbf q}) = \frac{1}{2} \dot{\mathbf q}^\top W \dot{\mathbf q} + \boldsymbol \lambda^\top (\dot{\mathbf x} - J\dot{\mathbf q}) = \frac{1}{2} \dot{\mathbf q}^\top W \dot{\mathbf q} - \boldsymbol \lambda^\top J\dot{\mathbf q} + \boldsymbol \lambda^\top \dot{\mathbf x} $$

The quadratic term $$ \frac{\partial}{\partial \dot{\mathbf q}}\left( \dot{\mathbf q}^\top W \dot{\mathbf q} \right) = (\text{product rule}) = \frac{\partial}{\partial \dot{\mathbf q}}\left( \dot{\mathbf q}^\top \underbrace{W \dot{\mathbf q}}_\text{treat as constant term} \right) + \frac{\partial}{\partial \dot{\mathbf q}}\left( \underbrace{\dot{\mathbf q}^\top W}_\text{treat as constant term} \dot{\mathbf q} \right) = \\ = (W \dot{\mathbf q})^\top + \dot{\mathbf q}^\top W = \dot{\mathbf q}^\top (W^\top + W). $$ I assume that $W$ is a symmetric matrix, so the last derivative simplifies to $2\dot{\mathbf q}^\top W$.

The next term $$ \frac{\partial}{\partial \dot{\mathbf q}}\left( \boldsymbol \lambda^\top J\dot{\mathbf q} \right) = \boldsymbol \lambda^\top J. $$ The last one does not depend on $\dot{\mathbf q}$ thus is zero.

Collecting it altogether gives $$ \frac{\partial g(\dot{\mathbf q})}{\partial \dot{\mathbf q}} = \dot{\mathbf q}^\top W - \boldsymbol \lambda^\top J. $$ Transposing both sides gives the answer $$ \left(\frac{\partial g(\dot{\mathbf q})}{\partial \dot{\mathbf q}}\right)^\top = W \dot{\mathbf q} - J^\top \boldsymbol \lambda. $$ Note the sign in front of $J$, your professor has the opposite.

Best Answer

Related Solutions

[Math] How to derive the Levenberg–Marquardt algorithm with matrix calculus

[Math] Partial Derivative of Gaussian function: Matrix differentiation

Related Question