Here I derive all the necessary properties and identities for the solution to be self-contained, but apart from that this derivation is clean and easy. Let us formalize our notation and write the loss function a little more compactly. Consider $m$ samples $\{x_i,y_i\}$ such that $x_i\in\mathbb{R}^d$ and $y_i\in\mathbb{R}$. Recall that in binary logistic regression we typically have the hypothesis function $h_\theta$ be the logistic function. Formally
$$h_\theta(x_i)=\sigma(\omega^Tx_i)=\sigma(z_i)=\frac{1}{1+e^{-z_i}},$$
where $\omega\in\mathbb{R}^d$ and $z_i=\omega^Tx_i$. The loss function (which I believe OP's is missing a negative sign) is then defined as:
$$l(\omega)=\sum_{i=1}^m -\Big( y_i\log\sigma(z_i)+(1-y_i)\log(1-\sigma(z_i))\Big)$$
There are two important properties of the logistic function which I derive here for future reference. First, note that $1-\sigma(z)=1-1/(1+e^{-z})=e^{-z}/(1+e^{-z})=1/(1+e^z)=\sigma(-z)$.
Also note that
\begin{equation}
\begin{aligned}
\frac{\partial}{\partial z}\sigma(z)=\frac{\partial}{\partial z}(1+e^{-z})^{-1}=e^{-z}(1+e^{-z})^{-2}&=\frac{1}{1+e^{-z}}\frac{e^{-z}}{1+e^{-z}}
=\sigma(z)(1-\sigma(z))
\end{aligned}
\end{equation}
Instead of taking derivatives with respect to components, here we will work directly with vectors (you can review derivatives with vectors here). The Hessian of the loss function $l(\omega)$ is given by $\vec{\nabla}^2l(\omega)$, but first recall that $\frac{\partial z}{\partial \omega} = \frac{x^T\omega}{\partial \omega}=x^T$ and $\frac{\partial z}{\partial \omega^T}=\frac{\partial \omega^Tx}{\partial \omega ^T} = x$.
Let $l_i(\omega)=-y_i\log\sigma(z_i)-(1-y_i)\log(1-\sigma(z_i))$. Using the properties we derived above and the chain rule
\begin{equation}
\begin{aligned}
\frac{\partial \log\sigma(z_i)}{\partial \omega^T} &=
\frac{1}{\sigma(z_i)}\frac{\partial\sigma(z_i)}{\partial \omega^T} =
\frac{1}{\sigma(z_i)}\frac{\partial\sigma(z_i)}{\partial z_i}\frac{\partial z_i}{\partial \omega^T}=(1-\sigma(z_i))x_i\\
\frac{\partial \log(1-\sigma(z_i))}{\partial \omega^T}&=
\frac{1}{1-\sigma(z_i)}\frac{\partial(1-\sigma(z_i))}{\partial \omega^T}
=-\sigma(z_i)x_i
\end{aligned}
\end{equation}
It's now trivial to show that
$$\vec{\nabla}l_i(\omega)=\frac{\partial l_i(\omega)}{\partial \omega^T}
=-y_ix_i(1-\sigma(z_i))+(1-y_i)x_i\sigma(z_i)=x_i(\sigma(z_i)-y_i)$$
whew!
Our last step is to compute the Hessian
$$\vec{\nabla}^2l_i(\omega)=\frac{\partial l_i(\omega)}{\partial \omega\partial \omega^T}=x_ix_i^T\sigma(z_i)(1-\sigma(z_i))$$
For $m$ samples we have $\vec{\nabla}^2l(\omega)=\sum_{i=1}^m x_ix_i^T\sigma(z_i)(1-\sigma(z_i))$. This is equivalent to concatenating column vectors $x_i\in\mathbb{R}^d$ into a matrix $X$ of size $d\times m$ such that $\sum_{i=1}^m x_ix_i^T=XX^T$. The scalar terms are combined in a diagonal matrix $D$ such that $D_{ii}=\sigma(z_i)(1-\sigma(z_i))$. Finally, we conclude that
$$ \vec{H}(\omega)=\vec{\nabla}^2l(\omega)=XDX^T$$
A faster approach can be derived by considering all samples at once from the beginning and instead work with matrix derivatives. As an extra note, with this formulation it's trivial to show that $l(\omega)$ is convex. Let $\delta$ be any vector such that $\delta\in\mathbb{R}^d$. Then
$$\delta^T\vec{H}(\omega)\delta = \delta^T\vec{\nabla}^2l(\omega)\delta = \delta^TXDX^T\delta = \delta^TXD(\delta^TX)^T = \|\delta^TDX\|^2\geq 0$$
since $D>0$ and $\|\delta^TX\|\geq 0$. This implies $H$ is positive-semidefinite and therefore $l$ is convex (but not strongly convex).
Best Answer
Expanding Frank Harrells answer, to derive likelihood function you first need to define the probabilistic model of the problem. In the case of logistic regression, we are talking about a model for binary target variable (e.g. male vs female, survived vs died, sold vs not sold etc.). For such data, Bernoulli distribution is the distribution of choice. Notice that using $\{0, 1\}$ or $\{-1, +1\}$ coding is not a part of the definition of the problem, it is just a way of encoding your data, the labels are arbitrary and can be changed. In this case we decide to use the $\{0, 1\}$ labels because they have some nice properties, but the main problem in logistic regression is estimating the probability of "success". We use the $\{0, 1\}$ encoding, because the model is defined in terms of Bernoulli distribution that uses such labels.
If you insisted on defining the likelihood function in terms of a distribution that assigns $1-p$ probability for $-1$ and $p$ probability for $+1$, then you would need to use such distribution in your likelihood function. The distribution would have the following probability mass function
$$ g(x) = p^{(x+1)/2} (1-p)^ {1-(x+1)/2} $$
what basically reduces to Bernoulli distribution for $(X+1)/2 \in \{0, 1\} $.