We have
$\frac{d}{d\beta} (y - X \beta)' (y - X\beta) = -2 X' (y - X \beta)$.
It can be shown by writing the equation explicitly with components. For example, write $(\beta_{1}, \ldots, \beta_{p})'$ instead of $\beta$. Then take derivatives with respect to $\beta_{1}$, $\beta_{2}$, ..., $\beta_{p}$ and stack everything to get the answer. For a quick and easy illustration, you can start with $p = 2$.
With experience one develops general rules, some of which are given, e.g., in that document.
Edit to guide for the added part of the question
With $p = 2$, we have
$(y - X \beta)'(y - X \beta) = (y_1 - x_{11} \beta_1 - x_{12} \beta_2)^2 +
(y_2 - x_{21}\beta_1 - x_{22} \beta_2)^2$
The derivative with respect to $\beta_1$ is
$-2x_{11}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{21}(y_2 - x_{21}\beta_1 - x_{22} \beta_2)$
Similarly, the derivative with respect to $\beta_2$ is
$-2x_{12}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{22}(y_2 - x_{21}\beta_1 - x_{22} \beta_2)$
Hence, the derivative with respect to $\beta = (\beta_1, \beta_2)'$ is
$
\left(
\begin{array}{c}
-2x_{11}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{21}(y_2 - x_{21}\beta_1 - x_{22} \beta_2) \\
-2x_{12}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{22}(y_2 - x_{21}\beta_1 - x_{22} \beta_2)
\end{array}
\right)
$
Now, observe you can rewrite the last expression as
$-2\left(
\begin{array}{cc}
x_{11} & x_{21} \\
x_{12} & x_{22}
\end{array}
\right)\left(
\begin{array}{c}
y_{1} - x_{11}\beta_{1} - x_{12}\beta_2 \\
y_{2} - x_{21}\beta_{1} - x_{22}\beta_2
\end{array}
\right) = -2 X' (y - X \beta)$
Of course, everything is done in the same way for a larger $p$.
Best Answer
Unfortunately, unlike linear regression, there's no simple formula for the maximum likelihood estimate of logistic regression. You'll have to perform some kind of optimization algorithm, like gradient descent or iteratively reweighted least squares.