Solved – What book is recommendable to start learning statistics using R at the same time

rreferences

Books to Learn Statistics using R

What exactly is the book I'm looking for.

What I am looking for is a book that teaches you statistics while using R to give you hands-on experience and thus end up helping you learn R together. I've seen on amazon many books that attempts to do that, but not with R. Examples are Minitab and SAS.

Are the R Book and Statistical Computing an option? – Still not answered.

The R Book and Statistical Computing: An Introduction to Data Analysis using S-Plus seems viable, but a reader opinion here would be helpful and welcome.

How the book relate to statistics courses?

To be even more precise on what I was looking for, consider these two courses learning outcomes on statistics from a math department at the university Im currently a student:

Intermediate Statistics and Probability & Statistics, that is, I'm looking in a book a normal statistics course going to intermediate level but rather than just board and paper having you learning and using R instead. That also means I am looking for a book that assume I want to learn statistics from the beginning.

This book is for researchers too.

I am also a software engineer researcher, but I guess the current situation where you are found with mountains of data and want to learn statistics to go on writing code to automate that is pretty much applicable to many other fields.

That means I'm am not interested on learning every single detail of every single property for every single curve, but am more concerned on making sense of data for my research domain, although I would not mind if the book wanted to go deep on that.

As a final motivation, I find myself reading scientific papers in different sort of communities that claim results based on statistical inference while there is no readable proof if the statistics assumptions/constraints are being violated or not.

A R book that is not much about statistics won't ensure I am not following up on this practice, which is also why I decided looking for a book that is akin to a statistics course using R rather than playing around with a overview book.

Related questions in Cross Validated.

Answers and feedback for this question.

@Julie

Suggested books were few I already come across but are an example that unfortunately doesn't suits me:

Introductory Statistics with R, Using R for Introductory Statistics, Statistics: An Introduction using R are few of the books that I already looked on amazon but are about an statistics overview or make assumptions that requires previous statistics knowledge. The problem with overview books is mostly about not calling attention to the assumptions, constraints and provide enough explanation to result in make sense of the information.

If you believe there is no book that could fit on this needing as well or think the R book or the Statistical Computing: An Introduction to Data Analysis using S-Plus would fit this, I would also appreciate this type of answer.

@Christopher Aden

Introduction to Probability and Statistics Using R seems to be the closest one but still broad general to what I was looking for.

What I was expecting for is a book such as David S. Moore, The Basics of Statistics because:

  • It covers all statistics subjects.
  • It uses two tools, miniTab and other to give hands-on learning on the just explained method.
  • It very much highlight assumptions and constraints. This is very important for a researcher who has not taken a in depth statistics course and want to use statistics. Hardly overview books will cover them, which is dangerous for researchers.
    • You can see the book table of contents here. Notice how the focus is statistics and the tool usage is to improve understanding and get the student to know how to use tools to do the statistics after learning in an easier way. Its not about the tool, its about statistics!

I want exactly the same thing, but using R.

@Gregory Demin

It uses R as pedagogy examples, assumes you want to learn statistics and best of all, it is open source. Unfortunately, does not cover ANOVA nor ANCOVA, or more advanced subjects.

@Peter Ellis

Good suggestion for a textbook that covers what is wanted in this question.

Books in the asker opinion that answer the question.

@Peter Ellis and @Gregory Demin.

Collection of R Books on Amazon

Amazon discussion about R books for different students background may be found here.

Video Lectures teaching Statistics using R

Google Tech Talks from 2007 that also motivated this question and covers more about Data Mining rather than statistics but using R together here.

Best Answer

I think one reason it is so hard to answer this is that R is so powerful and flexible that a real introduction to R programming goes well beyond what is normally needed in an introduction to statistics. The books that teach statistics using MiniTab, JMP or SPSS are doing relatively straightforward things with the software that barely scratch the surface of what R is capable of when it comes to data manipulation, simulations, custom-built functions, etc.

Having said that, I think that Wilcox's Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction (2012) is a brilliant new book. It assumes no statistical knowledge and takes you from scratch right through to a big range of modern robust techniques; and assumes not much more R knowledge than the ability to open it up and load a dataset. It covers many of the classical techniques too including ANOVA (mentioned in the OP).

I would see this book as the equivalent of the books that introduce stats and a stats package like SPSS at the same time. However, it won't teach you to program in R - only how to do modern statistical analysis with it, with an emphasis on robust techniques that address the known problems with classical analysis that are sidelined by most other approaches to teaching statistics.

The three problems with classical methods that this book particularly addresses right from the beginning are sampling from heavy-tailed distributions; skewness; and heteroscedasticity.

Wilcox uses R because "In terms of taking advantage of modern statistical techniques, R clearly dominates. When analyzing data, it is undoubtedly the most important software development during the last quarter of a century. And it is free. Although classic methods have fundamental flaws, it is not suggested that they be completely abandoned... Consequently, illustrations are provided on how to apply standard methods with R. Of particular importance here is that, in addition, illustrations are provided regarding how to apply modern methods using over 900 R functions written for this book."

This book is so excellent that after we bought a copy for work I purchased my own copy at home.

The chapter headings are:

  1. numerical and graphical summaries of data;
  2. probability and related concepts;
  3. sampling distributions and confidence intervals;
  4. hypothesis testing;
  5. regression and correlation;
  6. bootstrap methods;
  7. comparing two independent groups;
  8. comparing two dependent groups;
  9. one-way ANOVA;
  10. two-way and three-way designs;
  11. comparing more than two dependent groups;
  12. multiple comparisons;
  13. some multivariate methods;
  14. robust regression and measures of association;
  15. basic methods for analyzing categorical data;

Further edit - having checked out the David Moore example of what you are looking for, I really think Wilcox's book meets the need.