Ultimate physical motivation
Strictly in the sense of physics, the entropy is less free than it might seem. It always has to provide a measure of energy released from a system not graspable by macroscopic parameters. I.e. it has to be subject to the relation
$${\rm d}U = {\rm d}E_\text{macro} + T {\rm d} S$$
It has to carry all the forms of energy that cannot be expressible macroscopically, which we summarize as "heat" but the actual physics behind this "heat" might be quite different from the notions in gases etc. If entropy does not satisfy this relation, it is not a physical entropy. This would be a full characterization of entropy for macrophysics. I am going to use only this definition, not the cases where entropy is a handle to talk about information.
Statistical formulation
This constraint indeed does provide some freedom for the statistical definition of entropy, but not in effect. The freedom is basically in the fact that we are doing the $N\to \infty$ and $V \to \infty$ limits and a lot of information from the definition gets smeared out. We can for example define the phase space volume of the microcanonical ensemble in three distinct ways. First one is
$$\Omega_\text{sharp} = \int_{\sum E = U} d \mu$$
Where $\mu$ is some kind of measure over the space of states. Or we can put
$$\Omega_\text{non-sharp} = \int_{\sum E \in (U-\varepsilon,U)} d \mu$$
or even
$$\Omega_\text{nobody cares} = \int_{\sum E < U} d \mu$$
Any of these will work for $S = k_B \log \Omega$ in the mentioned limits (the limit will give the same $S$). But this is more of a relict of the large limits - the physically plausible option is $\Omega_\text{sharp}$.
The much more important issue is counting the number of relevant states, the transition from discrete states to continuous ones and why we should consider them "democratic". This would be a very long argument involving ergodicity and so on.
For ergodic Hamiltonian systems, the probability measure is certainly proportional to $d^n x d^np$ where $n$ is the number of degrees of freedom. From quantum mechanics we know, that the "democracy" factor of discrete to continuous states makes this measure $d^n x d^np/h$ with $h$ the Planck constant. (Only the relative weights matter, since we normalize anyways.)
The conclusion is that the procedures of statistical physics, for a given system, can give us entropy unambiguously (up to an additive constant representing the freedom of state normalization).
Hand waivy conclusion
So there always is one entropy for every situation and we know how to derive it. The trick is only to specify which degrees are "free" or getting randomized in a complicated interaction and turn on the statistics.
But there are some loopholes. We see that the justification of the whole procedure (the "democratization" of states) relies on the Hamiltonian formulation and basically also quantization. But we know quantization is more of an art than a science and the statistical procedure can run into very similar problems as quantization. Are we always sure what the macroscopic parameters of a system are? How do we describe the situation when we observe the microstate directly? What would be the entropy of a relativistic space-time? Which would be the "activated" degrees of freedom? Etc. But this is a question for the "art of physics".
Additional note: "Art of physics" - modelling and confirming
A brief comment on "the art of physics". As with any physical models and approximations, there are three criteria:
- Foundation on (more) elementary physics
- Self-consistence of result with assumption
- Empirical verification
Say we have an open system $\Xi$ with a channel of particle inflow. However, we only know how to compute the parameters relevant for the inflow for small number densities in $\Xi$, because then we can use a one-particle model of entrance and leaving from the system. The one-particle model would be the point 1. - foundation on physics believed to be fundamental. We thus presume low number density and compute the statistics of the system.
But this is where the theorist's work should not stop, the last step is to check whether the density is sufficiently low under any choice of parameters and identify these regions in parameter space - this is point 2. However, this is a very primitive conception. For a serious model, the theorist should at least check whether two and higher particle models of inflow cannot suddenly take over even at low densities and investigate under what conditions they do not. This is 1. mixing with 2.
Nevertheless, there is also 3. - the empirical verification. It would be very naïve to pretend that the theorist is able to anticipate all the possible effects. In fact, Einstein's papers are well known to just shoot out a model without long mathematical discussions of neglected effects, and give experimental predictions right away. Sometimes, intuition rules (sometimes it also does not).
In the case of entropy this would be achieved by measuring the heat response of the system. It's not only heat capacities in the form
$$C_{...} \sim \left(\frac{\partial S}{\partial T}\right)_{| ...\;=\text{const}.} $$
but also a lot of other response coefficient involving temperature as specified e.g. by the Maxwell relations.
So the answer would be: If a well developed model predicting quantitatively the entropy exists and it is confirmed by thorough testing, the entropy qualifies as the unique entropy of the system.
Additional note: Observed mathematical conditions
Let's say our the physical motivation is paramount. Then the strongest we can say is the following:
- Entropy is a single-valued function of the full set of macroscopic parameters. (I.e. if it is not it might also be because the list of parameters is not complete.)
- Entropy has a finite difference between any two points in the macro parameter space. I.e. $|\Delta S|<\infty$.
- Entropy is homogeneous in the parameters defined by physical criteria as "extensive". I.e. for a complete set of extensive parameter $A_i$ we have $S(\lambda A_1, ...,\lambda A_n, ...) = \lambda S(A_1,...,A_n,...), \forall \lambda <0$.
In phase transitions as common as freezing/melting entropy is even discontinuous thus the criterion. (But this happens only in the $N \to \infty$ limit as discussed e.g. by Kardar in his notes.) Physically we are able to measure only $\Delta S$ so a strict requirement of well defined $dS$ is both redundant and impossible for some very common systems.
It is important that the "extensivity" is just saying "take another copy of the system" - the parameters which double by this operation are extensive but so is also heat stored in the new "double" system. Taking all the extensive parameters and multiplying by $\lambda$ just means "taking $\lambda$ copies of the system". This all relies heavily on the fact that we are able to very clearly identify the physical operation of "taking another copy of the system".
There are cases such as Black hole thermodynamics where such a notion fails. In a way, the whole space-time is the thermodynamical system, so "take another copy of the system" is hard to specify. (More technically, the formulas are for isolated black holes and there is no way to screen out gravity otherwise than by infinite distance.) It might seem that the horizon surface $A$ would be an extensive parameter but it actually grows as $\sim M^2$ - we cannot just say "double the mass" because that would not work.
Hint: You have an error in your computations. In particular in the grand canonical ensemble,
\begin{align}
\langle E \rangle \neq -\frac{\partial \log Q}{\partial \beta}.
\end{align}
Moreover, I just did the whole computation having corrected this error in the appropriate way, and it worked out the way it should.
Addendum, 2019-02-02. Details Beyond the Hint
Step 1. Recall the following definitions of the grand canonical partition function $Q$, the ensemble average energy $\langle E\rangle$ in the grand canonical ensemble, and the ensemble average particle number $\langle N\rangle$ in the grand canonical ensemble. All sums are over states $i$ of the system:
\begin{align}
Q \equiv \sum_ie^{-\beta(E_i - \mu N_i)}, \qquad \langle E\rangle \equiv \sum_i \frac{e^{-\beta(E_i - \mu N_i)}}{Q}E_i, \qquad \langle N\rangle \equiv \sum_i \frac{e^{-\beta(E_i - \mu N_i)}}{Q}N_i
\end{align}
Step 2. Show that the following identity follows from the definitions in Step 1:
\begin{align}
\langle E\rangle = -\frac{\partial \ln Q}{\partial \beta} + \mu\langle N\rangle.
\end{align}
Step 3. Show that if we take
\begin{align}
Q = V\frac{e^{\beta\mu}}{\lambda^3},
\end{align}
then
\begin{align}
-\frac{\partial \ln Q}{\partial \beta} = \frac{3}{2}\frac{\langle N\rangle}{\beta} - \mu\langle N\rangle.
\end{align}
Step 4. Combine steps 2 and 3 to obtain the desired result.
Best Answer
The missing piece is the inertia of the chain links. In other words, the ideal chain model only examines possibilities over the configurational coordinates of the chain's phase space, and completely neglects the momentum coordinates. In this sense the ideal chain model is not a real statistical mechanical calculation, since it does not integrate over the real mechanical phase space.
We would face the same situation with the ideal gas if we calculated entropy of the ideal gas based only on where the atoms could be, and not how fast they could be moving. It would give an incorrect expression for entropy, like $k \ln V$ per molecule, missing several additive terms also dependent on mass and temperature. Using (1) this would lead to the correct expression for pressure ($kT/V$ per molecule), but we would fail to see in (2) where the force comes from microscopically.
In the same way, the reason the ideal chain gets away with neglecting the chain links' momentum is that it cheats and produces the wrong entropy missing many terms, yet this entropy does have the correct value of $(\partial S/\partial x)_T$. Anyway, this answers where the force comes from in the microscopic picture: it is caused by the inertia of the chain links wiggling around randomly, but on average pulling the chain ends together. When the chain ends are moved inwards, on average the chain at that time will be moving in such a way that its motion is accelerated by the changing end positions.
(An aside: If this what is going on, why is this not mentioned in the many articles I've seen that discuss the entropic spring? I think it's good to have both micro and macro pictures.)
I wonder though, whether it's true that including inertia would produce the exact same force. After all, the chain links are rigidly fixed and so all of their speeds are somehow correlated with each other.