Part of my PhD thesis was on this stuff, so I hope I can give a satisfactory answer.
Maximum entropy production and minimum entropy production are different types of principle with different domains of application. Before discussing the answer I should make clear that the maximum entropy production principle (which I'll call MaxEP) is really a collection of different hypotheses by different authors, some of which are more plausible than others, and none of which has an accepted theoretical justification. However, there is some empirical evidence in the work of Paltridge from the 70s, e.g. this paper. A very simple one-parameter version of Paltridge's model can be found in this paper by Lorenz et al., and in the discussion below I will keep as close as possible to the version of MaxEP that Lorenz et al. use.
As you say, Prigogine's principle of minimum entropy production (henceforth MinEP) only applies in near-equilibrium situations. It was once hypothesised to be much more widely applicable. This hypothesis has now been disproven, and one must be careful to bear this in mind when reading old material on the subject. (For the moment I've lost track of the paper that disproves this idea, but it's a pretty solid mathematical result. If I find it again I'll update this answer.)
With these caveats out of the way, the basic difference is this:
For linear, near-equilibrium systems that only admit a single steady state, MinEP says that all of the system's transient states have a higher entropy production than the steady state. A transient state is a temporary state that is not a steady state. MinEP compares steady states with non-steady states.
For some yet-to-be-determined class of non-linear, far-from-equilibrium systems that admit a continuum of possible steady states, MaxEP says that the system is most likely to be found in the steady state with the greatest entropy production. MaxEP compares steady states to other steady states, but says nothing about transient states.
So aside from the fact that the two principles apply to quite different types of system (linear versus highly non-linear), they also make quite different types of claim. One can imagine a system that admits many possible steady states, but whose transient states all have a higher entropy production than any of its steady states. For such a system, MinEP and MaxEP could apply simultaneously. If so then starting from a non-steady initial state, its entropy production would reduce over time until it reached a steady state and would remain constant thereafter; but nevertheless the steady state that it reaches is most likely to be the one with the highest entropy production.
Unfortunately there is a depressing amount of literature in which these points are not well appreciated. It seems that people often think MaxEP implies that entropy production should increase over time as the system approaches a steady state. But this isn't true for a lot of systems, and I think this mistake in reasoning might be one of the reasons why MaxEP doesn't have a great reputation as a hypothesis.
As for literature that addresses this distinction, I seem to remember there being some fairly readable discussion in this book chapter by Dewar. Another place to look is Edwin Jaynes' criticism of the minimum entropy production principle. It doesn't really mention MaxEP (because Jaynes seems not to have been aware of Paltridge's papers) but it gives some strong hints towards it, and I found it extremely helpful in understanding the nature of MinEP and why a different type of principle is needed. Finally, I suppose I could also humbly point you to my paper on MaxEP, which doesn't discuss MinEP but tries to clarify some points about how MaxEP is applied, and to resolve some serious theoretical problems with the principle. These papers deal with some of the issues I've skipped over above, such as what it means for a system to have "possible" steady states that are different from the actual one.
Edit to reply to comment
The OP has commented that maybe the above implies that systems always choose the most entropy-producing state they "could" be in, regardless of whether this is a transient or a steady state, but for the transient states the maximum possible entropy production can reduce over time as the system converges to a steady state.
There are several ways I can address this. The first possibility is to say that above I was talking only about the version applied by Paltridge and by Lorenz et al., because this is the only version with even the tiniest little sliver of empirical evidence. It's very, very important to note that this version of MaxEP doesn't say anything at all about transient states. As Paltridge has said (as the OP points out), his version of MaxEP is just an empirical observation and not a theoretical claim, and it's an observation of the atmosphere's steady state, not its transient ones.
It's also important to note that there are few if any systems other than atmospheres that have been observed to obey a principle similar to Paltridge's. (There are claims for other systems, mostly in the Earth sciences, but I don't find these very convincing. There are no laboratory-based observations of Paltridge's principle as far as I know, although this is partly because the experimental crowd have their own completely different "principle of maximum entropy production" that they like to play with, in which systems choose between a finite number of steady states instead of a continuum.) So we already know that MaxEP as an empirical principle is not broadly applicable to all non-linear systems, and it shouldn't be surprising that we get contradictions if we try to imagine it applying too broadly. It might well be that MaxEP, if it is a valid principle at all, will turn out to apply only to thermally-driven turbulent fluids in steady state with very large Reynolds numbers, and not to any other type of system.
However, in addition to considering the empirical evidence due to Paltridge, we can consider the theoretical claims that have been made about MaxEP. In my opinion the most advanced such arguments are due to Dewar (2003, 2005). Dewar does make the claim that MaxEP is broadly applicable - in fact, he says it's applicable to all systems in a steady state, but that all steady-state systems maximise their entropy production subject to constraints, and most systems are more heavily constrained than atmospheres, so that it's difficult to use MaxEP to make predictions about them. (This sounds like circular reasoning but it isn't. It's very similar to the way equilibrium system maximise their entropy subject to constraints such as conservation laws.) But again, Dewar's theory does not make any claims at all about transient states. Dewar's proof cannot be interpreted in the way the OP suggests, because it only compares steady states to other steady states, not to transient ones.
(As a side note, I should say that although I think Dewar's work is the closest thing we have to a theoretical explanation of Paltridge's observations, I don't think it's quite correct. My paper, linked above, attempts to resolve what I see as a serious logical contradiction in his approach. This is a different contradiction from the one we've been discussing so far, and has to do with the fact that Dewar's version of MaxEP makes different predictions depending on where you draw the system's boundary.)
I could just leave it there. However, in my paper I do make the claim that Dewar's version of MaxEP (or something like it) can be extended to transient states, in something quite similar to the way you suggest. Like Dewar, I try to extend Jaynes' MaxEnt thermodynamics to deal with non-equilibrium states. Briefly, the idea is that if we maximise the information entropy of the system's microscopic state at time $t_1$, subject to the knowledge we have about the system from measurements made at time $t_0$ then, trivially, we've maximised the rate of increase of information entropy between times $t_0$ and $t_1$. Identifying this information entropy with the thermodynamic entropy is trickier than it might seem at first, but if we can do that then we've reached a version of MaxEP that does indeed apply to all states, transient or otherwise.
However, I don't think it leads to a contradiction if you look at it in this way. The reason is that, given the knowledge constraints formed by the measurements at $t_0$, there is exactly one macrostate at every time $t>t_0$ that maximises the (information) entropy subject to those constraints; it cannot be any other way. This means, I think, that within this framework it is not possible for the situation you suggest to arise, and transient states with high entropy productions must always lead to steady states with high entropy productions. (But, having thought about it a bit more just now, this is all subject to an additional constraint of reproducibility that I don't think I spelt out very clearly in the paper. This needs more thought on my part.)
Important Note
For the sake of it not getting lost, there is an in-depth and (currently) on-going discussion of this answer and related issues in this chat room.
Sir James Jeans has an excellent answer, without using the word "order" or "disorder". Consider a card game...¿is there anyone else on this forum besides me who still plays whist?
His example is whist. I had better use poker.
You have the same probability of being dealt four aces and the king of spades as of being dealt any other precisely specified hand, e.g., the two of clubs, the three of diamonds, the five of hearts, the eight of spades, and the seven of spades. Almost worthless.
This is the microstate. A microstate is the precisely specified hand.
A macrostate is the useful description, as in the rules: two pairs, four of a kind, royal flush, flush, straight, straight flush, worthless.
You have a much lower probability of being dealt four aces than of being dealt a worthless hand.
A macrostate is the only thing about which it makes sense to say "ordered" or "disordered". The concept of "order" is undefined for a microstate. A macrostate is highly ordered or possesses a lot of information if your knowledge that you are in that macrostate implies a very high degree of specificity about what possible microstate you might be in. "Four aces", as a description, tells you a lot. "One pair", as a description, tells you much less. So the former is a state of low entropy and the latter is a state of higher entropy.
A macrostate is a probability distribution on the set of microstates. "Four aces" says that the microstate "ace, deuce, three, four, five, all spades" has zero probability. Most microstates have zero probability, they are excluded from this description. But "four aces and king of spades" has probability 1/48, so does "four aces and king of hearts", etc. etc. down to "four aces and deuce of clubs". The entropy formula then is $-k \log \frac1{48}$ where $k$ is not Boltzmann's constant. But the entropy of "one pair" is much higher: put "W" to be the number of different precisely specified hands which fall under the description "one pair". Then its entropy is $k \log W$.
Jeans makes the analogy with putting a kettle of water on the fire. The fire is hotter than the water. Energy (heat) is transferred from the fire to the water, and also from the water to the fire, by, let us assume, molecular collisions only. When we say "cold kettle on a hot fire" we are describing a macrostate. When we say "water boils" that is another macrostate. When we say "fire gets hotter and water freezes" we are also describing a possible macrostate that might result. ¿What are the entropies? They are proportional to the logarithm of the number of microstates that fall under theses three descriptions. Now, by the Maxwell distribution of energies of the molecules, there are very many high-energy molecules in the fire that come into contact with lower-energy molecules in the water and transfer energy to the water. There are very many precisely specified patterns of interaction at the individual molecular level of energy transfer from the fire to the water, so "boils" has a large entropy.
But there are some ways of freezing the water: the Maxwell distribution says that a few of the molecules in the fire are indeed less energetic than the average molecule in the water. It is possible that only (or mostly) these molecules are the ones that collide
with the water and receive energy from the water. This is in strict analogy to the card game: there are very few aces, but it is possible you will get dealt all of them. But there are far fewer ways for this to happen, for the water to freeze, than for the previous process of boiling. Therefore this macrostate has less entropy than the "boils" macrostate.
This example shows that to use the definition of entropy you have to have defined the complete set of possible microstates you will consider as possible, and you have to
study macrostates which are sets of these microstates. These different macrostates can be compared as to their entropies.
You cannot suddenly switch to a hockey game and compare the entropy of a full house to the entropy of "Leafs win". If you wish to make comparisons such as that, you would have to initially define an overarching system which contained both sets of microstates, define the macrostates, and even then comparisons of entropy would be merely formalistic. The laws of thermodynamics only apply when there is an interaction between all parts of the system such that any one component of the system has a chance of interchanging energy with any other part of the system. Within the time allotted. We also had to assume that each hand was equally likely, i.e., that Persi Diaconis was not dealing...some dealers know how to make some precisely specified hands less likely than others.... without these two assumptions, there is no connection between thermodynamic entropy and informational entropy, and so the Second Law of thermodynamics will not apply to informational entropy.
Better than "more ordered" would be to think "more specificity".
There are fewer ways to arrange that all the slow molecules are in one body and all the fast ones in the other than there are to arrange a seemingly random mix, just as there are fewer ways to arrange that I get all the aces, than there are to arrange that each player gets one ace.
See also Does entropy really always increase (or stay the same)? where I put Sir James's exact words at length.
Best Answer
Just to add another look to the answer of Paul. As you know any thermodynamic potential has its own variables. As the function of its variables it achieves a minimum in equilibrium. Entropy is one of such potentials that can be used on the equal basis with the others. The only difference is that it achieves minimum rather than maximum. It appeared historically this way, that entropy has been defined as it is. Put minus in front of it, and it will be as all the others.
OK, its own variables are the internal energy, E, the volume, V, and the number of particles N. The first of these, the internal energy, is very inconvenient in use. One typically cannot measure this parameter independently. It is not often evident, how to calculate it into measurable parameters to compare with experiment. For this reason the entropy is rarely used.
In fact the philosophy behind is that you first define the set of variables that is adequate to the problem, and then work with the corresponding potential. The set (V, T, N) corresponds to the free energy. Use that!
Second. You write:
You are definitely wrong. Below I just give you a counter-example that will clarify the situation:
a)The symmetry group of a crystalline solid is a discrete group with (i) translations over a discrete set of vectors, rotation over discrete set of angles, and reflection in a discrete set of planes. Eventually the symmetry might be even more complex, but this does not influence my example.
b) Introducing a disorder one may transform this crystal into amorphous solid. This has the symmetry group with continuous translations and rotations and the infinite set of the mirror planes. This group is continuous, the so-called, Euclidean motion group. Any symmetry group of any crystal is its subgroup. Thus, by increasing the disorder we increased the symmetry.
c) By further increasing the disorder (say, by increasing the temperature) we may bring our solid into the liquid state. Here the symmetry is still higher, since it allows all movements with a constant volume. The Euclidean motion group is the subgroup of this one.
One may also give a number of such examples within the solid state.
Though it is typical that during phase transitions a high-temperature phase has a higher symmetry, there are also opposite examples. They are too specific and I will not give them here. There are also examples for transitions between solid states with the symmetry groups that have no group-subgroup relations between one-another. In this case one cannot decide which symmetry is higher.
To summarize all this one should say that there is no solid rule about the relation between disorder and symmetry, though increasing disorder is often followed by the symmetry increase.