Tukey Median Polish, algorithm is used in the RMA normalization of microarrays. As you may be aware, microarray data is quite noisy, therefore they need a more robust way of estimating the probe intensities taking into account of observations for all the probes and microarrays. This is a typical model used for normalizing intensities of probes across arrays.
$$Y_{ij} = \mu_{i} + \alpha_{j} + \epsilon_{ij}$$
$$i=1,\ldots,I \qquad j=1,\ldots, J$$
Where $Y_{ij}$ is the $log$ transformed PM intensity for the $i^{th}$probe on the $j^{th}$ array. $\epsilon_{ij}$ are background noise and they can be assumed to correspond to noise in normal linear regression. However, a distributive assumption on $\epsilon$ may be restrictive, therefore we use Tukey Median Polish to get the estimates for $\hat{\mu_i}$ and $\hat{\alpha_j}$. This is a robust way of normalizing across arrays, as we want to separate signal, the intensity due to probe, from the array effect, $\alpha$. We can obtain the signal by normalizing for the array effect $\hat{\alpha_j}$ for all the arrays. Thus, we are only left with the probe effects plus some random noise.
The link that I have quoted before uses Tukey median polish to estimate the differentially expressed genes or "interesting" genes by ranking by the probe effect. However, the paper is pretty old, and probably at that time people were still trying to figure out how to analyze microarray data. Efron's non-parametric empirical Bayesian methods paper came in 2001, but probably may not have been widely used.
However, now we understand a lot about microarrays (statistically) and are pretty sure about their statistical analysis.
Microarray data is pretty noisy and RMA (which uses Median Polish) is one of the most popular normalization methods, may be because of its simplicity. Other popular and sophisticated methods are: GCRMA, VSN. It is important to normalize as the interest is probe effect and not array effect.
As you expect, the analysis could have benefited by some methods which take advantage of information borrowing across genes. These may include, Bayesian or empirical Bayesian methods. May be the paper that you are reading is old and these techniques weren't out until then.
Regarding your second point, yes they are probably modifying the experimental data. But, I think, this modification is for a better cause, hence justifiable. The reason being
a) Microarray data are pretty noisy. When the interest is probe effect, normalizing data by RMA, GCRMA, VSN, etc. is necessary and may be taking advantage of any special structure in the data is good. But I would avoid doing the second part. This is mainly because if we don't know the structure in advance, it is better not impose a lot of assumptions.
b) Most of the microarray experiments are exploratory in their nature, that is, the researchers are trying to narrow down to a few set of "interesting" genes for further analysis or experiments. If these genes have a strong signal, modifications like normalizations should not (substantially) effect the final results.
Therefore, the modifications may be justified. But I must remark, overdoing the normalizations may lead to wrong results.
If the uncontaminated data in your sample is drawn from an asymmetric distribution and the measure of scale you use to determine the width of the rejection region assumes that the good part of your data is symmetric, then, these rejection regions will be larger than they need to be. For illustration, if the distribution of the data is really right skewed. This would lead you to
- Reject genuine observations from the right tail as outliers.
- Fail to detect outliers from the left tail for what they are.
Overall, the combined effect would be that your (inappropriately) cleaned dataset will look more symmetric than it really is.
The alternative here is to use an outlier detection rule that treats the left and right tails of your sample separately. Of course, compared to the mad and median, this will also halve the breakdown point of your procedure (this is inevitable because the contamination rate of an half sample can be potentially twice as high as the contamination rate the full sample).
In my opinion, the best procedure for this problem is to use the rejection regions from the adjusted boxplots. In my experience (drawn from numerical simulation), they can be expected to reliably detect asymmetric contaminations even when the data contains as much as 10-15% outliers concentrated in one tail. Adjusted boxplots are widely implemented and their connection with the classical boxplots makes them easy to understand and use. This answer explains and illustrates the use of adjusted boxplots in a context quiet like yours.
Best Answer
What you are doing does not makes sense if your goal is to categorize what proportion of the entire population (sample A + sample B + sample C) is in category a, b, and c. Consider the following contingency table:
Then, for example, the median of the category a probabilities is 0.7 and the mean is 0.51, but only 16/50 = 0.32 of the all the observations are in column a. Likewise, the median of the category c probabilities would be 0.1, but only 0.36 of the observations are in column c. Does the "median summary" you propose tell you anything meaningful in a situation such as this one? Unless you have the marginal counts of either the samples or the categories, or you are willing to make some assumptions about them, I don't think there is a whole lot you can do in this case.
Do you have any specific goals in mind? Also, how many categories and samples do you have?
Edit: Your sample/population phrasing is slightly confusing. It's better to say you "have 3 samples, each which be sub-divided into 3 categories a,b, and c." The phrase "sample population" is troublesome, as is your reference to two different "populations."