Solved – Loss Function for Multinomial Logistic Regression – Cannot find its derivative

gradient descentlogisticloss-functionsmultinomial-distribution

For Multinomial Logistic Regression we can define the Loss Function in the following way:

$J(\theta)=\frac{-1}{m}\sum\limits_{i=1}^m\sum\limits_{j=1}^k 1(y^{(i)}=j)\log(\frac{\exp(\theta_j^{T}x^{(i)})}{\sum\limits_{l=1}^k\exp(\theta_l^{T}x^{(i)})})$

When I am trying to find the derivative of this expression with respect to $\theta$, I have:

$J(\theta)=\frac{-1}{m}\sum\limits_{i=1}^m\sum\limits_{j=1}^k 1(y^{(i)}=j)(x^{i}-x^{i}\frac{\sum\limits_{l=1}^k\exp(\theta_l^{T}x_i)}{\sum\limits_{l=1}^k\exp(\theta_l^{T}x_i)})$

However, this expression is totally incorrect because according to this website: http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
I should get:

enter image description here

Nevertheless, I do not understand at all how he obtained this result. Do you know which steps the teacher followed in order to find this result ?

Thank you so much for your help

Best Answer

  1. Transform log of fractions into the difference of logs
  2. Note that sum of values of indicator function over all classes equals to 1.
  3. Differentiate it and it should coincide with what is written in the material you are referring to.