Resampling – Differentiating Pseudoreplication from Resampling in Data Analysis

I wonder how pseudoreplication (https://en.wikipedia.org/wiki/Pseudoreplication) differs from resampling (which is simply resampling with replacement from a given sample).

On the one hand, resampling methods offer a convenient way to assess the distribution of statistics (which is good).

On the other hand, pseudoreplication seems to have a really bad reputation and is stated to be a problem. I understand that pseudoreplication was pointed out as a problem because of repeated measurements of an individual which are clearly correlated, while performing statistical tests etc. which assume uncorrelated data. ("Pseudoreplication occurs when observations are not statistically independent, but treated as if they are" https://bmcneurosci.biomedcentral.com/articles/10.1186/1471-2202-11-5#:~:text=Pseudoreplication%20occurs%20when%20observations%20are,correlated%20in%20time%20or%20space.)

However, I found wordings like

"pseudoreplication techniques, especially bootstrap …" https://www.google.de/books/edition/Practical_Methods_for_Design_and_Analysi/reYkGe9PLMQC?hl=de&gbpv=1&dq=%22pseudoreplication%22%20%22bootstrap%22&pg=PA149&printsec=frontcover&bsq=%22pseudoreplication%22%20%22bootstrap%22
"The next step in the logic is that of pseudoreplication. This involves
resampling the data after they are collected but in a way that
reflects the sampling procedures" https://support.sas.com/resources/papers/proceedings/proceedings/sugi22/STATS/PAPER279.PDF

which suggest that resampling techniques produce pseudoreplications.

Additionally, I found this book which seems to discuss pseudoreplication due to boostrapping on page 272 (but I do not get the point of the argument)

https://books.google.de/books?id=B9PDDQAAQBAJ&lpg=PA272&ots=Doj3ob14Jn&dq=%22pseudoreplication%22%20%22resampling%22&hl=de&pg=PA272#v=onepage&q&f=false

If pseudoreplication is such a big problem, then why doesn't this imply that resampling always has to suffer from this problem of pseudoreplication? And if this pseudoreplication is a problem in the bootstrap, is there a nice illustration showing the problem of psueodreplication and when do I know that I am running into the problem of pseudoreplication when applying the bootstrap?

I am struggling with "pseudoreplication" being branded as an incredible huge problem whenever you search for it, but also resampling (which is really cool) being called a pseudoreplication method. I want to get the implications right here.

PS: having a set of paired feature vectors and labels, i.e. ((feature_1, feature_2, …), label) it is clear to me that it only makes sense to do a pair bootstrap and resample the whole tuple (feature vector, label) because each feature_i might be correlated to another feature. But applying the bootstrap here to resample single feature values is an obvious application error.

PPS: This post has more literature on pseudoreplication: Is Hurlbert 1984 the best introductory overview to pseudoreplication?

Best Answer

I coined the term pseudoreplication for a class of statistical errors in the late 1970s not realizing it had occasionally been used in the past (CJ MCCarthy 1969)for completely different concepts or practices. The wikipedia article (which I had no role in) is focused on my first paper on the topic but is unclear, out of date and reflects unawareness of a large body of post-1984 papers on the topic. Below is a list of my own on the topic. I can send a pdfs; just contact me at [email protected]. I suggest the 2009 paper as an introduction to this literature.

Hurlbert, S.H. 1990. Pastor binocularis: Now we have no excuse [review of Design of Experiments by R. Mead]. Ecology 71: 1222-1228.

Hurlbert, S. H. and M. D. White. 1993. Experiments with freshwater invertebrate zooplanktivores: Quality of statistical analyses. Bull. Mar. Sci. 53: 128-153.

Lombardi, C.M. and S.H. Hurlbert. 1996. Sunfish cognition and pseudoreplication. Animal Behaviour 52:419-422

Hurlbert, S.H., 1997. Experiments in ecology [Review of book by same title by A.J. Underwood]. Endeavour 21:172-173.

Hurlbert, S.H., and C.M. Lombardi. 2003. Design and analysis: uncertain intent, uncertain result [review of Experimental design and data analysis for biologists by G. Quinn and M. Keough]. Ecology 83:810-812.

Hurlbert, S.H. and W.G. Meikle. 2003. Pseudoreplication, fungi, and locusts. Journal of Economic Entomology 96:533-535.

Hurlbert, S.H. 2004. On misinterpretations of pseudoreplication and related matters: a reply to Oksanen. Oikos 104: 591-597.

Kozlov, M. and S.H. Hurlbert. 2006. Pseudoreplication, chatter, and the international nature of science. Journal of Fundamental Biology 67(22):128-135. [In Russian; English translation available as pdf].

Hurlbert, S.H. 2009. The ancient black art and transdisciplinary extent of pseudoreplication. Journal of Comparative Psychology 123: 434-443.

Hurlbert, S.H. 2010. Pseudoreplication capstone: Correction of 12 errors in Koehnle & Schank (2009). Department of Biology, San Diego State University, San Diego, California. 5 pp.

Hurlbert, S.H. 2012. Pseudofactorialism, response structures, and collective responsibility. Austral Ecology 38: 646-663 + suppl. inform.

Hurlbert, S.H. 2013. Affirmation of the classical terminology for experimental design via a critique of Casella's Statistical Design. Agronomy Journal 105: 412-418 + suppl. inform.

Hurlbert, S.H. 2021. Pseudoreplication revisited: Some little-known history of an ancient big problem. Manuscript in revision.

Best Answer

Related Solutions

Solved – How/Why does resampling from “any” distribution lead to a normal distribution

How Michaud Resampling Enhances Mean-Variance Optimization Techniques

Related Question