Solved – Calculating t-SNE gradient (a mistake in the original t-SNE paper)

calculusgradienttsne

This is specific to the way the gradient of the KL divergence Loss function was derived in the original paper Visualizing Data using tSNE.

In the Appendix A (Page 21), where they derive the gradient, the Equation (27) is given as
$$
\frac{\partial C}{\partial d_{ij}} = – \sum_{k \neq l} p_{kl} \bigg ( \frac{1}{q_{kl}Z} \frac{\partial ((1+d_{kl}^2)^{-1})}{\partial d_{ij}} –
\frac{1}{Z} \frac{\partial Z}{\partial d_{ij}} \bigg )
$$
Evaluating the partial differentials specifically for $q_{ij}$ (which is the only dependent factor), we get
$$
\frac{\partial C}{\partial d_{ij}} = 2 \frac{p_{ij}}{q_{ij}Z} (1 + d_{ij}^2)^{-2}d_{ij} – 2 \sum_{k \neq l} p_{kl} \frac{(1 + d_{ij}^2)^{-2}d_{ij}}{Z}
$$

But in their equation (28), there is no extra $d_{ij}$ term. What am I missing here?

Best Answer

I just signed up for this forum due to your question :)

Nice question! It shows someone is indeed trying to follow & derive the nitty gritty. Your question is totally valid, (28) is indeed missing the $d_{ij}$, but then (24) is missing a $d_{ij}^{-1}$, you can see that from (21) via $\frac{\partial d_{ij}}{\partial y_i}$, taking into account that $$\frac{\partial \lVert\mathbf x\rVert}{\partial x_i} = \frac{x_i}{\lVert\mathbf x\rVert}.$$

So at the end everything is correct again! :P

Best Answer

Related Solutions

Solved – Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule

Solved – Similarity probabilities in SNE vs t-SNE

Related Question