A microstate is just a particular microscopic configuration of the system, where the state of each particle is fixed.
For example, take a three-level system with four particles. Treating the particles as indistinguishable, one particular microstate corresponds to two particles in the lowest state, which has energy $-\epsilon$, one particle in the second state with energy $0$, and one in the highest with energy $\epsilon$. The total energy of this microstate is then
$$E=2(-\epsilon) + 1(0) + 1(\epsilon) = -\epsilon$$
Now there are other ways you can get this energy. You can have three particles in the second state and one in the lowest state, for example. This is a different microstate. The equal probability of microstates says that those two microstates, corresponding to $E=-\epsilon$, are equally probable. Neither is "preferred."
Note that this idea of equal probability only applies to microstates with the same energy (and volume and number of particles). Microstates with different energies are not generally equally probable.
Further, if the system can exchange energy with a reservoir, the relative probabilities of two different energies is proportional to the number of microstates with that energy. This leads to the canonical ensemble, which uses the Boltzmann distribution along with this degeneracy of microstates to indicate the relative probabilities of different energy states.
A macrostate is a set of microstates. Some microstates are thermal, others are not.
Without the assumption of being in thermal equilibrium you can't assume anything since any possible microstate is possible. And lots of possibilities macrostates could be picked.
Usually you want to group your macrostates according to a state variables such as pressure, volume, total energy or something like that.
And when you break your 30 microstates into three groups: A, B, and C you can ask yourself if each group is classified according to a state variable such as pressure, volume, total energy or something like that.
And even if it is, then all you might know is the state variable, and even that maybe not precisely. For instance the volume isn't precisely known since the exact locations of all the many parts is not known.
Now even when the microstate is a particular microstate, and that microstate is assigned specifically to a particular macrostate, that doesn't tell you how that macrostate assigns probabilities to the microstates in it
And you can assume that each microstate in the macrostate is equally likely. But that is just an assumption. If energy is conserved, then the dynamics will always keep the total system at a configuration with that fixed initial energy, so it doesn't change from any state to just any state.
Doesn't thermal equilibrium mean the macrostate having the greatest multiplicity?
It means so much more. Firstly, it requires that a macrostate is a probability distribution on microstates. Secondly, the macrostate is specified by some (macro) state variables. Thirdly, the particular distribution specified by the hypothesis requires that the space of all microstates be partitioned (partitioned by different values of the state variables) and each partition has an equal probability assigned to every microstate in that part of the partition. I always imagine different floors of a mansion, where your variables constrain you to a different floor and each room in a floor is equally likely.
Now, thermal equilibrium doesn't mean having the greatest multiplicity. You could have $N$ particles of some gas at a certain pressure $P_0$ and volume $V$ and that could be one macrostate, and there might be macrostates with a larger multiplicity with the same $N$ and $V$ and larger $P_+\gt P_0$ but there isn't enough energy for those $N$ particles to have that pressure, there just isn't enough kinetic energy to spread around to get them to the $P_+\gt P_0$ macrostate variable.
If you want to go to the mansion example. Imagine that you have two mansions and one person can go up a floor if (and only if) the other person goes down a floor. When they are both by some stairs then one can go up and the other can go down. But if that places one of them into a floor with trillions more rooms available than the other one had, then they are way way more likely to stay in the configuration with the one stuck in the floor with way more rooms.
So energy can be exchanged between the two people, but the additional energy spends most of it's time with one of them having more energy if they can exchange it. Eventually they could get to a level where one gains as many rooms as the other one loses. And that joint collection roughly is the macrostate of the combined system.
And when that happens we say they are thermal equilibrium. And the thing they have in common, temperature, is how many additional rooms they gain per bit of energy. Maybe one has stairs that are longer, so going up/down one flight for it is going up/down 5 flights for the other. But maybe the rate at which the floor have more rooms changes with floors at a different rate. There could still be a $\textrm dS/\textrm dE$ in common.
Can thermal equilibrium have fluctuation?
The macrostate could, in principle change from thermal equilibrium and more to have to two subsystems be at different temperature instead of the same temperature. But that would require bouncing around until you are by some stairs going down instead of towards any of the many more options on staying on the same floor, and then continuing to do the improbable floor after floor until the temperature of the two subsystems are very out of line.
And even if it happened, it could just go back to equilibrium. The idea is that for a large enough system, the time to wait for such a thing to happen is just really really long.
Best Answer
Suppose you have a box of volume $V$ filled with a mol of an ideal gas with internal energy $E$. This defines the macrostate of your system, or intuitively, how your system looks in a macroscopic scale. However, we still don't know how it looks in a microscopic scale, i.e., we don't know how the $\sim 10^{23}$ particles over there are behaving individually. There are many different possibilities, which are the microstates of the system. For example, at time $t = t_0$ they could have positions $x_i$ and velocities $v_i$, where the index runs over all the particles. This is one particular microstate. However, the macrostate would be the same if particle $i=1$ had position $x_2$ and velocity $v_2$ while particle $i=2$ had $x_2$ and $v_2$ (I'm assuming things are classical and indistinguishable for simplicity). So which is the correct microstate?
From a macroscopic point of view, we don't know. All we can do is attribute what is the probability of the system being in each possible microstate. The principle you stated implies that both microstates I exemplified are equally likely to be the actual microstate. We don't know which is the right microstate, and all the possible ones are equally likely.
The system moving towards the largest number of microstates is then not only a change of microstates, but also a change of macrostate. If I mix my gas with another box of gas at different temperature or something, the system will reach equilibrium at the macrostate with the most possible microstates. We still won't know what is the right microstate, being able only to attribute probabilities.
Essentially, as OP pointed out in the comments, the idea is that since we do not know which microstate is the correct one, we assign equal probabilities to all of them.
Now, this does have a bit of nuance. Is it always valid to do this? In fact, it depends on the information you have about your system. Instead of an ideal gas, let us pick a generic gas. If the energy is fixed, then all available microstates should have the very same internal energy and there is no reason to prefer one of them over the other ones. We call this the microcanonical ensemble. On the other hand, suppose temperature (which is related to the expectation value of energy) is fixed. In this situation, there could be states with more internal energy than others, as long as the temperature stays the same (for the ideal gas, this won't happen because the energy is proportional to temperature, but let us consider a more general scenario). In this situation, it can be more likely for microstates with lower internal energy to occur, so we won't pick all probabilities to be the same. Instead, they are given by a Boltzmann distribution. This is known as the canonical ensemble.
The key point is that since we do not know what is the true microstate, we can only assign probabilities. We do this according to the information we have (or according to the experimental conditions, if you prefer). For fixed energy, all microstates should have the very same probability of being the true microstate, so they are, in this sense, equally likely.