TL;DR, we have that
$$\mu^*_k = \frac{\sum_{i=1}^n W_{ik}x_i}{\sum_{i=1}^n W_{ik}}$$
$$\Sigma^*_k = \frac{\sum_{i=1}^{n} W_{ik}(x_i -\mu^*_k)(x_i - \mu^*_k)'}{\sum_{i=1}^n W_{ik}}$$
In particular, this is the same as finding the MLE of a gaussian rv, but we weight by $W_{ik}$ for each $k$.
See below for the derivation, which is fairly similar to MLE for multivariate gaussian.
It may help to approach the E step a bit differently. In your second equation for the E step, you correctly have that you want to maximize
$$\sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right) log \left(\pi_j \frac{1}{|\Sigma_j|^{1/2}(2\pi)^{d/2}} \operatorname{exp}\left(-\frac{1}{2}(x_i-\mu_i)^{T}\Sigma_j^{-1}(x_i-\mu_i)\right)|X,\theta\right)$$
but we can more simply write that as
$$\sum_{i=1}^{n} \sum_{j=1}^{K} P\left(Z_i=j|X,\theta\right)\left(log(\pi_j) + log\left(\mathcal{N}(x_i;\mu_j,\Sigma_j)\right)\right)$$
where $\mathcal{N}$ denotes the gaussian density. Following your notation, let $W_{ij} = P\left(Z_i=j|X,\theta\right)$. As pointed out in the comments, we want to basically take derivatives with respect to $\mu_k$ and $\Sigma_k$ for each $k=1,\dots,K$, set to $0$, and solve to find the maximum. Our first step is to note that for a given $k$, taking derivative with respect to either $k$ parameter will be zero for any $j\neq k$ in that summation, and so for any $(i,j)$ where $j\neq k$, the derivative will just be zero. So maximizing the above is the same as maximizing
$$\sum_{i=1}^{n} W_{ik}\left(log(\pi_k) + log\left(\mathcal{N}(x_i;\mu_k,\Sigma_k)\right)\right)$$
A key point of the EM algorithm is precisely that $W_{ik}$ is estimated in the E step, and so we can think of it as a constant for our cases, and while we're at it, since
$$W_{ik}\left(log(\pi_k) + log\left(\mathcal{N}(x_i;\mu_k,\Sigma_k)\right)\right) = W_{ik}log(\pi_k) + W_{ik}log\left(\mathcal{N}(x_i;\mu_k,\Sigma_k)\right)$$
for any $i$, we can also ignore that first part as taking derivative with respect to either parameter will be zero. So maximizing the E step for the $k$-th parameters is the same as maximizing
$$\sum_{i=1}^{n} W_{ik} log\left(\mathcal{N}(x_i;\mu_k,\Sigma_k)\right)$$
Suppose that $\Sigma_k \in \mathbb{R}^{d\times d}$. Then we know that the PDF of the guassian normal is
$$\frac{1}{2\pi^{d/2}\det(\Sigma_k)^{-1/2}} \exp(-\frac{1}{2}(x_i-\mu_k)'\Sigma_k^{-1}(x-\mu_k))$$
and taking log and using all the properties of log (in particular, $log(xz/y) = log(x) + log(z) - log(y)$ and $log(e(x)) = x)$), we have
$$log\left(\mathcal{N}(x_i;\mu_k,\Sigma_k)\right) = log(1) - log(2pi^{-d/2}) - \frac{1}{2}log(\det(\Sigma_k)) - \frac{1}{2}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k)$$
and again, since we are taking derivative, all the parts that don't include $\mu_k$ or $\Sigma_k$ will be set to zero, so maximizing
$$\sum_{i=1}^{n} W_{ik} log\left(\mathcal{N}(x_i;\mu_k,\Sigma_k)\right)$$
is the same as maximizing
$$\sum_{i=1}^{n} W_{ik}\left(-\frac{1}{2}log(\det(\Sigma_k)) - \frac{1}{2}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k)\right)$$
which simplifies to
$$-\frac{1}{2}\sum_{i=1}^{n} W_{ik}log(\det(\Sigma_k)) - \frac{1}{2}\sum_{i=1}^{n} W_{ik}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k)$$
Okay, we are finally ready to take derivatives, but we will need to know some vector and matrix derivative properties, so let's draw from the lovely Matrix Cookbook. From it, we know that $\frac{\partial x'Ax}{\partial x} = 2Ax$ if $x$ does not depend on $A$ and $A$ is symmetric. Since $\Sigma_k^{-1}$ is positive semidefinite, it is symmetric. So taking derivative with respect to $\mu_k$, we get rid of the first part, and for the second part we basically chain rule by taking with respect to $(x_i-\mu_k)$ and our derivative rule and then taking derivative of that with $\mu_k) and get that
$$\frac{\partial \frac{-1}{2}\sum_{i=1}^{n} W_{ik}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k)}{\partial \mu_k} = \sum_{i=1}^n W_{ik}\Sigma_k^{-1}(\mu_k - x_i) = 0 $$
which implies that
$$\sum_{i=1}^n W_{ik}\Sigma_k^{-1}\mu_k = \sum_{i=1}^n W_{ik}\Sigma_k^{-1}x_i \implies \mu_k\sum_{i=1}^n W_{ik} = \sum_{i=1}^n W_{ik}x_i$$
and so $\mu_k = \frac{\sum_{i=1}^n W_{ik}x_i}{\sum_{i=1}^n W_{ik}}$. Yay!
Now let's do $\Sigma_k$. This one is trickier, but the key facts you need to know are that $\frac{\partial{x'Ax}}{\partial A} = xx'$, and that $\frac{\partial log(\det(A))}{\partial A} = A^{-T}$. Again check out the Matrix Cookbook to see why. We will also use the fact that
$$-\frac{1}{2}\sum_{i=1}^{n} W_{ik}log(\det(\Sigma_k)) = \frac{1}{2}\sum_{i=1}^{n} W_{ik}log(\det(\Sigma_k^{-1}))$$
which follows from pushing the $-1$ into the log and using the fact that $det(A^{-1}) = det(A)^{-1}$. Then we can re-write
$$-\frac{1}{2}\sum_{i=1}^{n} W_{ik}log(\det(\Sigma_k)) - \frac{1}{2}\sum_{i=1}^{n} W_{ik}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k) = \frac{1}{2}\sum_{i=1}^{n} W_{ik}log(\det(\Sigma_k^{-1})) - \frac{1}{2}\sum_{i=1}^{n} W_{ik}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k)$$
Taking derivative with respect to $\Sigma_k^{-1}$, we have
$$\frac{\partial \frac{1}{2}\sum_{i=1}^{n} W_{ik}log(\det(\Sigma_k^{-1})) - \frac{1}{2}\sum_{i=1}^{n} W_{ik}(x_i-\mu_k)'\Sigma_k^{-1}(x_i-\mu_k)}{\partial \Sigma_k^{-1}} = \frac{1}{2}\sum_{i=1}^n W_{ik}\Sigma_k - \frac{1}{2}\sum_{i=1}^{n} W_{ik}(x_i -\mu_k)(x_i - \mu_k)'$$
And setting this to zero and solving for $\Sigma_k$ gives us that
$$0 = \sum_{i=1}^n W_{ik}\Sigma_k -\sum_{i=1}^{n} W_{ik}(x_i -\mu_k)(x_i - \mu_k)'$$
which simplifies to
$$\Sigma_k = \frac{\sum_{i=1}^{n} W_{ik}(x_i -\mu_k)(x_i - \mu_k)'}{\sum_{i=1}^n W_{ik}}$$
using the previously maximized $\mu_k$ here, and we are done!
Best Answer
The EM algorithm is really just an iterative approximation to true Maximum Likelihood Estimation. Even if implemented correctly,
In practice, one often runs the EM procedure several times, with different starting conditions, to avoid mistaking a local optimum for a global one.
I think this is a good opportunity to highlight a passage by the late great MacKay, who uses this particular corner case to attack the principle of MLE in general:
(Note that what he calls "soft k-means" is just EM.)