Sir James Jeans has an excellent answer, without using the word "order" or "disorder". Consider a card game...¿is there anyone else on this forum besides me who still plays whist?
His example is whist. I had better use poker.
You have the same probability of being dealt four aces and the king of spades as of being dealt any other precisely specified hand, e.g., the two of clubs, the three of diamonds, the five of hearts, the eight of spades, and the seven of spades. Almost worthless.
This is the microstate. A microstate is the precisely specified hand.
A macrostate is the useful description, as in the rules: two pairs, four of a kind, royal flush, flush, straight, straight flush, worthless.
You have a much lower probability of being dealt four aces than of being dealt a worthless hand.
A macrostate is the only thing about which it makes sense to say "ordered" or "disordered". The concept of "order" is undefined for a microstate. A macrostate is highly ordered or possesses a lot of information if your knowledge that you are in that macrostate implies a very high degree of specificity about what possible microstate you might be in. "Four aces", as a description, tells you a lot. "One pair", as a description, tells you much less. So the former is a state of low entropy and the latter is a state of higher entropy.
A macrostate is a probability distribution on the set of microstates. "Four aces" says that the microstate "ace, deuce, three, four, five, all spades" has zero probability. Most microstates have zero probability, they are excluded from this description. But "four aces and king of spades" has probability 1/48, so does "four aces and king of hearts", etc. etc. down to "four aces and deuce of clubs". The entropy formula then is $-k \log \frac1{48}$ where $k$ is not Boltzmann's constant. But the entropy of "one pair" is much higher: put "W" to be the number of different precisely specified hands which fall under the description "one pair". Then its entropy is $k \log W$.
Jeans makes the analogy with putting a kettle of water on the fire. The fire is hotter than the water. Energy (heat) is transferred from the fire to the water, and also from the water to the fire, by, let us assume, molecular collisions only. When we say "cold kettle on a hot fire" we are describing a macrostate. When we say "water boils" that is another macrostate. When we say "fire gets hotter and water freezes" we are also describing a possible macrostate that might result. ¿What are the entropies? They are proportional to the logarithm of the number of microstates that fall under theses three descriptions. Now, by the Maxwell distribution of energies of the molecules, there are very many high-energy molecules in the fire that come into contact with lower-energy molecules in the water and transfer energy to the water. There are very many precisely specified patterns of interaction at the individual molecular level of energy transfer from the fire to the water, so "boils" has a large entropy.
But there are some ways of freezing the water: the Maxwell distribution says that a few of the molecules in the fire are indeed less energetic than the average molecule in the water. It is possible that only (or mostly) these molecules are the ones that collide
with the water and receive energy from the water. This is in strict analogy to the card game: there are very few aces, but it is possible you will get dealt all of them. But there are far fewer ways for this to happen, for the water to freeze, than for the previous process of boiling. Therefore this macrostate has less entropy than the "boils" macrostate.
This example shows that to use the definition of entropy you have to have defined the complete set of possible microstates you will consider as possible, and you have to
study macrostates which are sets of these microstates. These different macrostates can be compared as to their entropies.
You cannot suddenly switch to a hockey game and compare the entropy of a full house to the entropy of "Leafs win". If you wish to make comparisons such as that, you would have to initially define an overarching system which contained both sets of microstates, define the macrostates, and even then comparisons of entropy would be merely formalistic. The laws of thermodynamics only apply when there is an interaction between all parts of the system such that any one component of the system has a chance of interchanging energy with any other part of the system. Within the time allotted. We also had to assume that each hand was equally likely, i.e., that Persi Diaconis was not dealing...some dealers know how to make some precisely specified hands less likely than others.... without these two assumptions, there is no connection between thermodynamic entropy and informational entropy, and so the Second Law of thermodynamics will not apply to informational entropy.
Better than "more ordered" would be to think "more specificity".
There are fewer ways to arrange that all the slow molecules are in one body and all the fast ones in the other than there are to arrange a seemingly random mix, just as there are fewer ways to arrange that I get all the aces, than there are to arrange that each player gets one ace.
See also Does entropy really always increase (or stay the same)? where I put Sir James's exact words at length.
the entropy of the universe is always increasing
True. Let's call this the total entropy. (Well, almost true, since the entropy of the universe remains constant for a reversible process).
When a hot stone is dropped in cold water, it's entropy decreases (it gets colder), but the water increases it's entropy at the same time (it receives heat and gets slightly warmer). At any times, when an entropy decrease is happening, an entropy increase is happening somewhere else. The funky thing is that this entropy increase always is numerically larger than the decrease.
You can calculate entropy change for a part of such a system with:
$$\Delta S=\int_1^2 \frac{1}{T} \mathrm{d}Q$$
When you do that for all parts and sum the entropy changes, it is always possitive $\sum \Delta S>0$ (again apart for the ideal case of a purely reversible process, where $\sum \Delta S=0$).
If anyone says entropy decrease it is because they are not talking about the whole system.
An example
Mix water at $T_{a1}=20\mathrm{^o C}$ with the same amount of water at $T_{b1}=80\mathrm{^o C}$.
The water-mix now with double the mass finds an inbetween equillibrium temperature at $T_2=50\mathrm{^o C}$.
Entropy change for the cold water: $\Delta S_a=\int_1^2 \frac{1}{T} \mathrm{d}Q_a=\int_{T_{a1}}^{T_2} \frac{1}{T} (mc\,\mathrm{d}T)=mc\int_{T_{a1}}^{T_2} \frac{1}{T}\mathrm{d}T$
Entropy change for the warm water: $\Delta S_b=\int_1^2 \frac{1}{T} \mathrm{d}Q_b=\int_{T_{b1}}^{T_2} \frac{1}{T} (mc\,\mathrm{d}T)=mc\int_{T_{b1}}^{T_2} \frac{1}{T}\mathrm{d}T$
$m$ and $c$ are the same for the two, but the integrals are (numerically) different because of $T_{a1}\neq T_{b1}$. So there must be a change in entropy. That this change is positive is clear if we write them out and solve the integrals:
$$\Delta S_a=mc(\ln T_2-\ln T_{a1})=mc\ln \frac{T_2}{T_{a1}} \text{ and}$$
$$\Delta S_b=mc(\ln T_2-\ln T_{b1})=mc\ln \frac{T_2}{T_{b1}}$$
And the total entropy change will be:
$$\sum \Delta S=\Delta S_a+\Delta S_b=mc(\ln \frac{T_2}{T_{a1}}+\ln \frac{T_2}{T_{b1}})=mc\ln (\frac{T_2^2}{T_{a1}T_{b1}})=mc\ln \left(\frac{T_2^2}{(T_2-T_{diff})(T_2+T_{diff})}\right)=mc\ln \left(\frac{T_2^2}{T_2^2-T_{diff}^2}\right)$$
This is some extra rearranging just to prove that the sum is never negative $\sum \Delta S > 0$ because the fraction never is: $\frac{T_2^2}{T_2^2-T_{diff}^2}> 0$ (since $T_{diff}>0$).
So this is a small proof that heat transfer is an example of an irreversible process that will cause the total entropy to increase.
Best Answer
It's actually a nice example of why the 2nd law is useful: if you go around trying to account in microscopic detail the balance of things like energy and entropy, then you can easily go wrong. I used to have a student who would do this all the time; I don't know if he ever learnt the lesson...
In this specific case, you neglected damping --- without it, you would never come to an equilibrium and the 2nd law does not apply. With damping, you necessarily dissipate heat, and that loss will more than make up for the macroscopic ordering. This dissipation can be either mechanical friction, or (as an example of why you would be wrong about this situation being "closed") electro-magnetic --- oscillating dipoles emit radiation!