The reason is the following. We use the notation:
$$\theta x^i:=\theta_0+\theta_1 x^i_1+\dots+\theta_p x^i_p.$$
Then
$$\log h_\theta(x^i)=\log\frac{1}{1+e^{-\theta x^i} }=-\log ( 1+e^{-\theta x^i} ),$$ $$\log(1- h_\theta(x^i))=\log(1-\frac{1}{1+e^{-\theta x^i} })=\log (e^{-\theta x^i} )-\log ( 1+e^{-\theta x^i} )=-\theta x^i-\log ( 1+e^{-\theta x^i} ),$$ [ this used: $ 1 = \frac{(1+e^{-\theta x^i})}{(1+e^{-\theta x^i})},$ the 1's in numerator cancel, then we used: $\log(x/y) = \log(x) - \log(y)$]
Since our original cost function is the form of:
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^m \left[-y^i(\log ( 1+e^{-\theta x^i})) + (1-y^i)(-\theta x^i-\log ( 1+e^{-\theta x^i} ))\right]$$, which can be simplified to:
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^m \left[y_i\theta x^i-\theta x^i-\log(1+e^{-\theta x^i})\right]=-\frac{1}{m}\sum_{i=1}^m \left[y_i\theta x^i-\log(1+e^{\theta x^i})\right],~~(*)$$
where the second equality follows from
$$-\theta x^i-\log(1+e^{-\theta x^i})=
-\left[ \log e^{\theta x^i}+
\log(1+e^{-\theta x^i} )
\right]=-\log(1+e^{\theta x^i}). $$ [ we used $ \log(x) + \log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $\theta_j$. As
$$\frac{\partial}{\partial \theta_j}y_i\theta x^i=y_ix^i_j, $$
$$\frac{\partial}{\partial \theta_j}\log(1+e^{\theta x^i})=\frac{x^i_je^{\theta x^i}}{1+e^{\theta x^i}}=x^i_jh_\theta(x^i),$$
the thesis follows.
For convenience, define some variables and their differentials
$$\eqalign{
z &= X^T\theta, &\,\,\,\, dz = X^Td\theta \cr
p &= \exp(z), &\,\,\,\, dp = p\odot dz \cr
h &= \frac{p}{1+p}, &\,\,\,\, dh = (h-h\odot h)\odot dz \,= (H-H^2)\,dz \cr
}$$
where
$\,\,\,\,H={\rm Diag}(h)$
$\,\,\,\,\odot$ represents the Hadamard elementwise product
$\,\,\,\,\exp$ is applied elementwise
$\,\,\,\frac{p}{1+p}$ represents elementwise division
The cost function can be written in terms of these variables and the Frobenius inner product (represented by a colon)
$$\eqalign{
J &= -\frac{1}{m}\Big[y:\log(h) + (1-y):\log(1-h)\Big] \cr
}$$
The differential of the cost is
$$\eqalign{
dJ &= -\frac{1}{m}\Big[y:d\log(h) + (1-y):d\log(1-h)\Big] \cr
&= -\frac{1}{m}\Big[y:H^{-1}dh - (1-y):(I-H)^{-1}dh\Big] \cr
&= -\frac{1}{m}\Big[H^{-1}y - (I-H)^{-1}(1-y)\Big]:dh \cr
&= -\frac{1}{m}\Big[H^{-1}y - (I-H)^{-1}(1-y)\Big]:H(I-H)dz \cr
&= -\frac{1}{m}\Big[(I-H)y - H(1-y)\Big]:dz \cr
&= -\frac{1}{m}(y-h):X^Td\theta \cr
&= \frac{1}{m}X(h-y):d\theta \cr
}$$
The gradient
$$\eqalign{
G =\frac{\partial J}{\partial\theta} &= \frac{1}{m}X(h-y) \cr
}$$
(NB: Your gradient is missing the $\frac{1}{m}$ factor)
The differential of the gradient
$$\eqalign{
dG &= \frac{1}{m}X\,dh \cr
&= \frac{1}{m}X(H-H^2)\,dz \cr
&= \frac{1}{m}X(H-H^2)X^T\,d\theta \cr\cr
}$$
And finally, the gradient of the gradient (aka the Hessian)
$$\eqalign{
\frac{\partial^2J}{\partial\theta\,\partial\theta^T} &=
\frac{\partial G}{\partial\theta} =
\frac{1}{m}X(H-H^2)X^T \cr
}$$
Best Answer
I give you my calculations:
Lets's say that $x\in\mathbb{R}^n$ and $\theta\in\mathbb{R}^n$, then by chain rule
$$\frac{\partial}{\partial\theta_j}\log (1+e^{\theta x'}) = \frac{1}{1+e^{\theta x'}}\frac{\partial}{\partial\theta_j}(1+e^{\theta x'}),$$ then the derivative of a constant value is zero and the derivative of the second term by chain rule is $$\frac{\partial}{\partial\theta_j}(e^{\theta x'}) = e^{\theta x'}\frac{\partial}{\partial\theta_j}(\theta x') = e^{\theta x'}x_j$$ and therefore the solution is: $$\frac{\partial}{\partial\theta_j}\log (1+e^{\theta x'}) = \frac{e^{\theta x'}x_j}{1+e^{\theta x'}}.$$
EDIT: About your calculations, two points: first, sometimes people use $\log$ but they mean $\ln$, I do not if it is the case but you should check it. Second, when you calculate the derivative of $e^{\theta x'}$ you must apply the chain rule.