Solved – Rules of thumb for “modern” statistics

exploratory-data-analysismodelingrule-of-thumb

I like G van Belle's book on Statistical Rules of Thumb, and to a lesser extent Common Errors in Statistics (and How to Avoid Them) from Phillip I Good and James W. Hardin. They address common pitfalls when interpreting results from experimental and observational studies and provide practical recommendations for statistical inference, or exploratory data analysis. But I feel that "modern" guidelines are somewhat lacking, especially with the ever growing use of computational and robust statistics in various fields, or the introduction of techniques from the machine learning community in, e.g. clinical biostatistics or genetic epidemiology.

Apart from computational tricks or common pitfalls in data visualization which could be addressed elsewhere, I would like to ask: What are the top rules of thumb you would recommend for efficient data analysis? (one rule per answer, please).

I am thinking of guidelines that you might provide to a colleague, a researcher without strong background in statistical modeling, or a student in intermediate to advanced course. This might pertain to various stages of data analysis, e.g. sampling strategies, feature selection or model building, model comparison, post-estimation, etc.

Best Answer

Don't forget to do some basic data checking before you start the analysis. In particular, look at a scatter plot of every variable you intend to analyse against ID number, date / time of data collection or similar. The eye can often pick up patterns that reveal problems when summary statistics don't show anything unusual. And if you're going to use a log or other transformation for analysis, also use it for the plot.