Solved – The reference book for statistics with R – does it exist and what should it contain

rreferences

Background

There is a lot of discussion around this, so I thought that I could find my answer from earlier treads on StackExchange and by googling furiously. After using half a day trying to find only one reference book for (bio)statistics with R, I got utterly confused and had to give up. Maybe the free material combined is actually better than any of the books you can buy at the moment. Let’s it find out.

The internet is full of good free literature for R language, so there is really no point paying for a mediocre book, which ends up being used as an office decoration most of the time. The R home site lists books related to R and there are a lot of them. To be more exact: 115. Only one of them is advertised with words “standalone statistics reference book”. It is 8 years old now and may be outdated. The fourth edition of Modern Applied Statistics with S is even older. The R Book is often chewed out as too basic and not recommended because of lack of references, poorly formatted code and sloppy finish.

However, I am looking for one book, which I could use as a standalone reference to practical statistics (first and foremost) with R (secondary). The book should live on my office desk collecting annotations, coffee stains and greasy finger prints instead of dust on the book shelf. It should replace the collection of free pdf’s I have been using so far, not forgetting that R comes with an excellent reference library. “What is the right approach?”, “why?" and “technically, how does it work?” are often more burning questions than “how to do it with R?

Since I am an ecologist, I am mostly interested about applications to biostatistics. However, since these things are often connected, an interdisciplinary general reference would be the most valuable for me.

The task

If such a book exists (I doubt it), please provide the name of the book (only one per answer) and a short review of the book explaining why it should be named as the reference book for the topic. Since this question is not very different than the existing ones, please use this tread for your answer. You can also list flaws of the book so that we can list those as the features for the ideal reference book.

My question is what should the reference book for statistics (of most used kinds) with R contain?

Some initial thoughts are following general features (please, update):

  • Thick as a brick
  • Concise, but understandable
  • Filled with figures (with the R code provided)
  • Easy to understand tables and diagrams describing the most important details from the text
  • Easy to understand, descriptive text about the statistics / methods containing the most important equations.
  • Good examples for each approach (with R code)
  • Broad and up-to-date list of references
  • Minimal number of typos

Table of contents

Since I am not a statistician and would need this (not existing?) book to answer the question it's hard for me to write about the contents. Because The R Book clearly intends to be the reference book for statistics with R, but is often criticized, I copied the table of contents from the book as a starting point for the table of contents for the standalone R statistics reference book. Additional task: please, provide additions, suggestions, deletions, etc for the table of contents.

  1. Getting Started
  2. Essentials of the R Language
  3. Data Input
  4. Dataframes
  5. Graphics
  6. Tables
  7. Mathematics
  8. Classical Tests
  9. Statistical Modelling
  10. Regression
  11. Analysis of Variance
  12. Analysis of Covariance
  13. Generalized Linear Models
  14. Count Data
  15. Count Data in Tables
  16. Proportion Data
  17. Binary Response Variables
  18. Generalized Additive Models
  19. Mixed-Effects Models
  20. Non-linear Regression
  21. Tree Models
  22. Time Series Analysis
  23. Multivariate Statistics
  24. Spatial Statistics
  25. Survival Analysis
  26. Simulation Models
  27. Changing the Look of Graphics
  28. References and Further Reading
  29. Index

What has been said earlier?

StackExhange contains several treads asking statistics and R book suggestions. Books for learning the R language asks about a reference book learning R language without statistics aspect. The Art of R Programming is ranked out as the best single suggestion. Book to Learn Statistics using R asks for an ideal introductory book to statistics, which is really not the same thing than a reference book. Open Source statistical textbooks ranks Multivariate statistics with R as the best alternative. What book would you recommend for non-statistician scientists? asks about the best statistics reference book without specifying the program of choice. Reference or book on simulation of experimental design data in R scores perhaps closest to my question. Introduction to Scientific Programming and Simulation Using R is the most recommended book here and might be close to what I am looking for. However, this book either won't suffice as a single reference book to statistics with R.

Some suggestions for the reference book and their flaws

R in Action has received better reviews than The R Book, yet it is apparently rather introductory.

Biostatistical design and analysis using R: a practical guide is perhaps close to what I am looking for. It has received a good review, but apparently also this one contains many typos. In addition, this book does not concentrate on explaining statistics, but rather gives statistical analyses as readymade recipes for researchers to use.

Ecological Models and Data in R skips the introductory level. This is a very useful feature seeing that word "introduction", scores 43 occurrences in the R book list, but perhaps not entirely satisfying, if we’re after the reference book for statistics…?

Introduction to Scientific Programming and Simulation Using R received very positive review, but is limited to data simulation.

Richiemorrisroe suggests that Modern Applied Statistics with S is sufficient for a standalone statistics reference book with R. This book has received excellent reviews (1,2) and is probably the best candidate for the title at the moment? The most recent version came out 10 years ago, which is quite a long time considering program development.

Dimitriy V. Masterov suggests Data Analysis Using Regression and Multilevel/Hierarchical Models. Haven't checked this book out yet.


After reading lots of book reviews, it seems apparent that the perfect book asked here does not exist yet. However, it is perhaps possible to choose one that is pretty close. This tread is intended as a community wiki for statistics users to find the best existing reference book and as a motivation for the new and old book writers to improve their work.

Best Answer

I personally thought that Modern Applied Statistics with S-Plus ticks all of the boxes you have outlined. Every example has R code, they give good references to other sources, and Venables and Ripley have a wonderfully terse and explanatory writing style which I really appreciated. I tend to re-read the book every so often, and each time I get more from it. Of course, your mileage may vary.