In applications of the expectation-maximization algorithm, the likelihood of the data should monotonically increase.
As discussed by R. Neal and G. Hinton
the EM algorithm can be seen as (effectively) a gradient ascent algorithm in where the data likelihood is the objective function expressed as a function of the model parameters..
The main case I can envision where this type of problem would arise (aside from implementation bugs) is when one is using an approximation technique to solve for the values of the parameters that yield the expectation and/or optimization. For example, if one has to, due to the structure of the distributions in question, use approximations for the values of the parameters that maximize the likelihood; then these approximations may allow one to "jump across" a local maximum, just like in any other application of gradient-following algorithms.
For the specific case of apply the EM algorithm for estimating a two-component discrete Markov model, you should be able to evaluate the expectations and do the maximizations exactly, so you shouldn't be having this problem.
The objective of EM is to maximize the observed data log-likelihood,
$$l(\theta) = \sum_i \ln \left[ \sum_{z} p(x_i, z| \theta) \right] $$
Unfortunately, this tends to be difficult to optimize with respect to $\theta$. Instead, EM repeatedly forms and maximizes the auxiliary function
$$Q(\theta , \theta^t) = \mathbb{E}_{z|\theta^t} \left (\sum_i \ln p(x_i, z_i| \theta) \right)$$
If $\theta^{t+1}$ maximizes $Q(\theta, \theta^t)$, EM guarantees that
$$l(\theta^{t+1}) \geq Q(\theta^{t+1}, \theta^t) \geq Q(\theta^t, \theta^t) = l(\theta^t)$$
If you'd like to know exactly why this is the case, Section 11.4.7 of Murphy's Machine Learning: A Probabilistic Perspective gives a good explanation. If your implementation doesn't satisfy these inequalities, you've made a mistake somewhere. Saying things like
I have a close to perfect fit, indicating there are no programming errors
is dangerous. With a lot of optimization and learning algorithms, it's very easy to make mistakes yet still get correct-looking answers most of the time. An intuition I'm fond of is that these algorithms are intended to deal with messy data, so it's not surprising that they also deal well with bugs!
On to the other half of your question,
is there a conventional search heuristic or likewise to increase the likelihood of finding the global minimum (or maximum)
Random restarts is the easiest approach; next easiest is probably simulated annealing over the initial parameters. I've also heard of a variant of EM called deterministic annealing, but I haven't used it personally so can't tell you much about it.
Best Answer
I think there is some confusions about (a) the log-likelihood considered in the conditional expectation [which should be the complete likelihood] and (b) the current (fixed) parameter $\theta^{)t)}$ used in the E-step versus the (free) parameter $\theta$ to be derived by optimisation in the M-step as the new $\theta^{(t+1)}$:
If one looks at the E step for a mixture model, the complete(d) likelihood is $$L(\theta;\mathbf{x},\mathbf{z}) = p(\mathbf{x},\mathbf{z} \mid \theta) = \prod_{i=1}^n \prod_{j=1}^k \ [f(\mathbf{x}_i;\theta_j] ^{\mathbb{I}(z_i=j)}$$ where $\mathbf{z}=(z_1,\ldots,z_n)$ is the vector of latent variables [aka component indicators]. Therefore, $$\log L(\theta;\mathbf{x},\mathbf{z}) = \left\{ \sum_{i=1}^n \sum_{j=1}^k \mathbb{I}(z_i=j)\log f(x_i;\theta_j)\right\}$$ and (E-step) $$\mathbb E_{\theta^{(t)}} \left[ \log L(\theta;\mathbf{x},\mathbf{Z}) | \mathbf{x} \right] = \sum_{i=1}^n \sum_{j=1}^k \operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)}) \log f(x_i;\theta_j)\tag{1}$$ (Note that there are two different $\theta$ vectors in the above expression, as it is crucial for understanding the EM algorithm.) Hence, if the current value of $\theta^{(t)}$ is such that $$f(x_i;\theta_j^{(t)})=0$$ then $$\operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)})=0$$ and this term $\log f(x_i;\theta_j)$ does not appear in (1). If on the opposite $$\operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)})>0$$ then $\log f(x_i;\theta_j)$ appears in (1) and the maximisation of (1) cannot result in a value such that $\log f(x_i;\theta_j) = -\infty$