Mathematical Statistics – Analyzing Importance of Model Assumption and Evaluation in Laymen-Conducted Analyses

mathematical-statisticsmodelingmultiple regression

Bottom line, the more I learn about statistics, the less I trust published papers in my field; I simply believe that researchers are not doing their statistics well enough.


I'm a layman, so to speak. I'm trained in biology but I have no formal education in statistics or mathematics. I enjoy R and often make an effort to read (and understand…) some of the theoretical foundations of the methods that I apply when doing research. It wouldn't surprise me if the majority of people doing analyses today are actually not formally trained. I've published around 20 original papers, some of which have been accepted by recognized journals and statisticians have frequently been involved in the review-process. My analyses commonly include survival analysis, linear regression, logistic regression, mixed models. Never ever has a reviewer asked about model assumptions, fit or evaluation.

Thus, I never really bothered too much about model assumptions, fit and evaluation. I start with a hypothesis, execute the regression and then present the results. In some instances I made an effort to evaluate these things, but I always ended up with "well it didn't fulfill all assumptions, but I trust the results ("subject matter knowledge") and they are plausible, so it's fine" and when consulting a statistician they always seemed to agree.

Now, I've spoken to other statisticians and non-statisticians (chemists, physicians and biologists) who perform analyses themselves; it seems that people don't really bother too much about all these assumptions and formal evaluations. But here on CV, there is an abundance of people asking about residuals, model fit, ways to evaluate it, eigenvalues, vectors and the list goes on. Let me put it this way, when lme4 warns about large eigenvalues, I really doubt that many of its users care to address that…

Is it worth the extra effort? Is it not likely that the majority of all published results do not respect these assumptions and perhaps have not even assessed them? This is probably a growing issue since databases grow larger every day and there is a notion that the bigger the data, the less important is the assumptions and evaluations.

I could be absolutely wrong, but this is how I have perceived this.

Update:
Citation borrowed from StasK (below): http://www.nature.com/news/science-joins-push-to-screen-statistics-in-papers-1.15509

Best Answer

I am trained as a statistician not as a biologist or medical doctor. But I do quite a bit of medical research (working with biologists and medical doctors), as part of my research I have learned quite a bit about treatment of several different diseases. Does this mean that if a friend asks me about a disease that I have researched that I can just write them a prescription for a medication that I know is commonly used for that particular disease? If I were to do this (I don't), then in many cases it would probably work out OK (since a medical doctor would just have prescribed the same medication), but there is always a possibility that they have an allergy/drug interaction/other that a doctor would know to ask about, that I do not and end up causing much more harm than good.

If you are doing statistics without understanding what you are assuming and what could go wrong (or consulting with a statistician along the way that will look for these things) then you are practicing statistical malpractice. Most of the time it will probably be OK, but what about the occasion where an important assumption does not hold, but you just ignore it?

I work with some doctors who are reasonably statistically competent and can do much of their own analysis, but they will still run it past me. Often I confirm that they did the correct thing and that they can do the analysis themselves (and they are generally grateful for the confirmation) but occasionally they will be doing something more complex and when I mention a better approach they will usually turn the analysis over to me or my team, or at least bring me in for a more active role.

So my answer to your title question is "No" we are not exaggerating, rather we should be stressing some things more so that laymen will be more likely to at least double check their procedures/results with a statistician.

Edit

This is an addition based on Adam's comment below (will be a bit long for another comment).

Adam, Thanks for your comment. The short answer is "I don't know". I think that progress is being made in improving the statistical quality of articles, but things have moved so quickly in many different ways that it will take a while to catch up and guarentee the quality. Part of the solution is focusing on the assumptions and the consequences of the violations in intro stats courses. This is more likely to happen when the classes are taught by statisticians, but needs to happen in all classes.

Some journals are doing better, but I would like to see a specific statistician reviewer become the standard. There was an article a few years back (sorry don't have the reference handy, but it was in either JAMA or the New England Journal of Medicine) that showed a higher probability of being published (though not as big a difference as it should be) in JAMA or NEJM if a biostatistican or epidemiologist was one of the co-authors.

An interesting article that came out recently is: http://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412 which discusses some of the same issues.