Solved – Python implementation of indicator function in Softmax gradient

gradient descentmachine learningpythonsoftmax

I hope this is the right place for this question. I am following the Stanford Deep Learning tutorial http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/ trying to implement gradient decent with softmax. For the indicator function in the equation below,

\begin{align}
\nabla_{\theta^{(k)}} J(\theta) = – \sum_{i=1}^{m}{ \left[ x^{(i)} \left( 1\{ y^{(i)} = k\} – P(y^{(i)} = k | x^{(i)}; \theta) \right) \right] }
\end{align}

I am thinking of creating a numpy array that will hold the indicator for all the elements of the input X, which I can then implement.

First of all, I'm not sure that creating an array to hold the indicators is the right way to go, but here is my implementation so far:

indicator = [[1 if X[i,j]==y[i] else 0 for j in range(X.shape[1])] for i in range(X.shape[0])]

where X is the input and y is the labels.

This implementation is erroneous, in addition to being quite slow. I wonder if someone could set me in the right direction. Thanks!

Best Answer

The array of indicators for a single sample is just the one-hot representation of its label.

For instance, if there're in total 3 categories, and $x^{(i)}$ has the label $y^{(i)}=2$, then its one-hot representation is $[0,1,0]$.

In terms of code it should be something like

[[1 if y[i]==k else 0 for k in range(category_num)] for i in range(sample_num)]