The reason is the following. We use the notation:
$$\theta x^i:=\theta_0+\theta_1 x^i_1+\dots+\theta_p x^i_p.$$
Then
$$\log h_\theta(x^i)=\log\frac{1}{1+e^{-\theta x^i} }=-\log ( 1+e^{-\theta x^i} ),$$ $$\log(1- h_\theta(x^i))=\log(1-\frac{1}{1+e^{-\theta x^i} })=\log (e^{-\theta x^i} )-\log ( 1+e^{-\theta x^i} )=-\theta x^i-\log ( 1+e^{-\theta x^i} ),$$ [ this used: $ 1 = \frac{(1+e^{-\theta x^i})}{(1+e^{-\theta x^i})},$ the 1's in numerator cancel, then we used: $\log(x/y) = \log(x) - \log(y)$]
Since our original cost function is the form of:
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^m \left[-y^i(\log ( 1+e^{-\theta x^i})) + (1-y^i)(-\theta x^i-\log ( 1+e^{-\theta x^i} ))\right]$$, which can be simplified to:
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^m \left[y_i\theta x^i-\theta x^i-\log(1+e^{-\theta x^i})\right]=-\frac{1}{m}\sum_{i=1}^m \left[y_i\theta x^i-\log(1+e^{\theta x^i})\right],~~(*)$$
where the second equality follows from
$$-\theta x^i-\log(1+e^{-\theta x^i})=
-\left[ \log e^{\theta x^i}+
\log(1+e^{-\theta x^i} )
\right]=-\log(1+e^{\theta x^i}). $$ [ we used $ \log(x) + \log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $\theta_j$. As
$$\frac{\partial}{\partial \theta_j}y_i\theta x^i=y_ix^i_j, $$
$$\frac{\partial}{\partial \theta_j}\log(1+e^{\theta x^i})=\frac{x^i_je^{\theta x^i}}{1+e^{\theta x^i}}=x^i_jh_\theta(x^i),$$
the thesis follows.
Found it. I was missing the fact that $\log(\frac{A}{B})=\log(A)-\log(B)$. From there, we can easily do
\begin{align}
& \!\!\!\!\!\!\!\!\frac{\partial^2 \textrm{KL}(p(s, \theta)||\mu(s) q(\theta|s))}{\partial p(s,\theta)^2} = \\
& = \frac{\partial^2 p(s, \theta) \log \frac{p(s,\theta)}{Q(s,\theta)}}{\partial p(s,\theta)^2} = \nonumber\\
& = \frac{\partial^2 p(s, \theta) \log p(s,\theta) - p(s, \theta) \log Q(s,\theta)}{\partial p(s,\theta)^2} = \nonumber\\
& = \frac{\partial^2 p(s, \theta) \log p(s,\theta)}{\partial p(s,\theta)^2} - \frac{\partial^2 p(s, \theta) \log Q(s,\theta)}{\partial p(s,\theta)^2} = \nonumber\\
& = \frac{\partial^2 p(s, \theta)}{\partial p(s,\theta)^2} \log p(s,\theta) + 2 \frac{\partial p(s, \theta) \log p(s,\theta)}{\partial p(s,\theta)} + p(s, \theta) \frac{\partial^2 \log p(s,\theta)}{\partial p(s,\theta)^2} - 0 = \nonumber\\
& = 0 + \frac{2}{p(s,\theta)} + p(s, \theta) \frac{\partial \frac{1}{p(s,\theta)}}{\partial p(s,\theta)} = \nonumber\\
& = \frac{2}{p(s,\theta)} - \frac{1}{p(s,\theta)} = \nonumber\\
& = \frac{1}{p(s,\theta)} \geq 0 \nonumber\\
\end{align}
Best Answer
It's a "trick", when you use it to calculate $\nabla_\theta p(X,\theta)$ via the (hopefully, sometimes) easier expression $\log p(X,\theta)$. So the use is to write it as $$ \nabla_\theta p(X,\theta)=p(X,\theta)\,\nabla_\theta\log p(X,\theta), $$ in cases where the right-hand-side is easier than the left-hand-side. Typically, when $p$ has lots of products and exponents.