This question was asked recently on math.SE and the following points were already explained, at least partly. Unsurprisingly, the migration to stats.SE does not change the underlying mathematical structure of the model, so, here we go again.
First, there is no jump from state $0$ to state $2$ or from state $2$ to state $0$ because such a jump would require that both thermostats turn on at the same time or that they both turn off at the same time. Since the times when the thermostats switch are independent and absolutely continuous, they never coincide hence transitions $0\to2$ or $2\to0$ never occur, almost surely.
(Relevant lemma: If $\xi$ and $\eta$ are independent random variables and, say, $P(\xi=x)=0$ for every $x$, then $P(\xi=\eta)=0$.)
Second, except when $\mu=\lambda$, the process $X$ is not a Markov process. True, neither the time before a jump from state $0$ nor the time before a jump from state $2$ depend on the past, but the time before a jump from state $1$ very much does so.
To see why, imagine that $\lambda\ll\mu$ and that $X$ jumps to state $1$ coming from state $0$. This means that both thermostats were off before one of them switched on, causing the jump to state $1$. As is well known, the probability that the thermostat which switched on is the $\lambda$-thermostat is $\lambda/(\lambda+\mu)$ and the probability that it is the $\mu$-thermostat is $\mu/(\lambda+\mu)$ hence, most probably, $X$ arrived at state $1$ because the $\mu$-thermostat switched on. This means that one is more likely to see the $\mu$-thermostat switch again (this time, turning off) before the $\lambda$-thermostat does (thus, turning on), hence $X$ has more chances to go back to state $0$ than to move to state $2$.
Thus, the path $0\to1\to0$ is more probable than the path $0\to1\to2$. A similar reasoning shows that the path $2\to1\to2$ is more probable than the path $2\to1\to0$. This is impossible if $X$ is a Markov process.
The case $\lambda=\mu$ is exceptional, then $X$ is a Markov process, with transition rates $2\lambda$ for $0\to1$ and $2\to1$ and $\lambda$ for $1\to0$ and $1\to2$. Thus, as was to be expected, the stationary distribution is then $(\frac14,\frac12,\frac14)$, that is, binomial $(2,\frac12)$.
A final remark: to start from a Markov process (here, the pair of states of the thermostats) and to lump together some of its states (here, the states on-off and off-on) is a classical way to destroy the Markov property.
Best Answer
That sounds correct. You should be able to use induction and conditioning to deduce this. It has been a while since I've done anything with Markov Chains, so apologies in advance for any poor notation (also my first post ever :D ). The base case for induction should look something like this.
Suppose that we know the initial probabilities as above. Then we have :
$P_1[2] = P(X_2=1|X_1=1)P(X_1=1)+P(X_2=1|X_1=2)P(X_1=2)= S_1\epsilon_{11}[1]+S_2\epsilon_{21}[1],$
which are all known quantities.
I think a simple induction argument should finish it from there.