[Math] Why transpose in gradient descent

gradient descentmachine learning

I've been following Andrew NG's Machine learning course.
According to the gradient descent formula image in the link :
(There's no transpose sign)

However, in the Implementation, the answer key shows this:

theta = theta – sum((h – y) .* X)'

why the transpose sign in the answer key?

Best Answer

In general, if you call

$$ {\bf x} = \pmatrix{x_1 \\ x_2 \\ \vdots \\ x_n} ~~~\mbox{and}~~~ {\bf y} = \pmatrix{y_1 \\ y_2 \\ \vdots \\ y_n} $$

then

$$ {\bf x}^T{\bf y} = \pmatrix{x_1 & x_2 &\cdots & x_n}\pmatrix{y_1 \\ y_2 \\ \vdots \\ y_n} = x_1y+1 + x_2y_2 + \cdots + x_n y_n = \sum_{k=1}^nx_k y_k $$

The difference is then in whether you are using components (rightmost expression) or vectors (leftmost expression), both both are the same

Best Answer

Related Solutions

[Math] Gradient descent vs ternary search

[Math] Partial derivative in gradient descent for logistic regression

Related Question