You should use the signed rank test when the data are paired.
You'll find many definitions of pairing, but at heart the criterion is something that makes pairs of values at least somewhat positively dependent, while unpaired values are not dependent. Often the dependence-pairing occurs because they're observations on the same unit (repeated measures), but it doesn't have to be on the same unit, just in some way tending to be associated (while measuring the same kind of thing), to be considered as 'paired'.
You should use the rank-sum test when the data are not paired.
That's basically all there is to it.
Note that having the same $n$ doesn't mean the data are paired, and having different $n$ doesn't mean that there isn't pairing (it may be that a few pairs lost an observation for some reason). Pairing comes from consideration of what was sampled.
The effect of using a paired test when the data are paired is that it generally gives more power to detect the changes you're interested in. If the association leads to strong dependence*, then the gain in power may be substantial.
* specifically, but speaking somewhat loosely, if the effect size is large compared to the typical size of the pair-differences, but small compared to the typical size of the unpaired-differences, you may pick up the difference with a paired test at a quite small sample size but with an unpaired test only at a much larger sample size.
However, when the data are not paired, it may be (at least slightly) counterproductive to treat the data as paired. That said, the cost - in lost power - may in many circumstances be quite small - a power study I did in response to this question seems to suggest that on average the power loss in typical small-sample situations (say for n of the order of 10 to 30 in each sample, after adjusting for differences in significance level) may be surprisingly small.
[If you're somehow really uncertain whether the data are paired or not, the loss in treating unpaired data as paired is usually relatively minor, while the gains may be substantial if they are paired. This suggests if you really don't know, and have a way of figuring out what is paired with what assuming they were paired -- such as the values being in the same row in a table, it may in practice may make sense to act as if the data were paired to be safe -- though some people may tend to get quite exercised over you doing that.]
The Wilcoxon Rank sum test doesn't "subtract a random score of one group to a random score of another group".
If it did what you say in the question, that could work perfectly well with even very different sample sizes (since you could sample either with replacement so having the same sample size would be unnecessary), but that's not how it works.
As the name suggests, the rank-sum test sums the ranks in one of the samples. It may then apply a shift (say by subtracting the minimum possible sum of ranks).
[Where did you get the idea? It sounds like someone tried to explain permutation tests to you but they've ended up with a muddle of paired and independent sample and rank vs original-value notions all smooshed together.]
There's not one single alternative for the Wilcoxon Rank Sum test; it depends on what additional assumptions you make and how you look at it. The most general alternative form is that $P(X>Y)\neq \frac12$ (two tailed; the one tailed versions replace $\neq$ with either $<$ or $>$).
Best Answer
If you consider all four groups in a Kruskal-Wallis (the rank based 'one-way anova'), you would be in the position of wanting to test a contrast there.
Now Kruskal-Wallis is basically a special case of the proportional odds ordinal logistic model.
You could get this contrast by testing a combination of coefficients in the proportional odds ordinal logistic model.
That is, the kinds of contrasts you'd tend to do in ANOVA pretty much can be done for a generalization of the Wilcoxon type of approach.
I think Frank Harrell's R package
rms
may be able to do this, for example.That said, I agree with @NickCox's suggestion of considering modelling with glm's more generally; there may be GLMs that describe the mean, the mean-variance relationship and the general shape of your data fairly well, and in that case, your contrasts become not only easy to test, but perhaps also more directly interpretable in terms of relationships between means, especially if identity links were used.