Solved – Machine Learning and R textbook reference

machine learningrreferences

I am asking for a book reference to further my studies in machine learning with the R programming language. Feel free to reference multiple books that are just machine learning or just R programming. I understand some basic concepts, but I think I am having trouble applying my knowledge to get real results.

For example, I would like to test my knowledge out with some exercises from real datasets. Questions I would like to answer are: How much more accuracy can I gain by fine-tuning a regularization parameter? What is a good accuracy for a particular problem (in practice, what is the intuitive feel)? What are some common methods to improve my accuracy w.r.t features? How do I find the best value for a regularization parameter? Which machine learning algorithm will work well for this problem?

My programming knowledge of R is the basic control structures and some of the functional features such as the apply function to iterate over a data-frame. I also know some basics of the data-frame data structure and the list data structure.

I can apply a bunch of machine learning algorithms with R, but I do not know enough to tackle a wide variety of problems. I am hoping to gain some practical experience from book exercises as well as good understandings of the main algorithms. I understand math (calculus, graph theory, and probability) but I probably need a refresher in certain areas.

  • In summary, what book would you recommend for someone interested in gaining practical knowledge in Machine Learning and using the R programming language?

Best Answer

Applied Predictive Modeling, by Max Kuhn (who wrote the caret package) and Kjell Johnson, is very practical with tons of code and real world datasets.

Practical Data Science with R by Nina Zumel and John Mount is an excellent new book that broadly covers data science, including ML, at an introductory-to-intermediate level.

An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani is a lucid introductory book written by many of the same authors as the more advanced Elements of Statistical Learning. Hastie and Tibshirani should be familiar names as they have also made major contributions to machine learning and R (packages including glmnet, gam, hierNet, etc). Both ISL and ESL are available as PDFs online but I personally think they're also worth getting in hardcopy.

For improving understanding of R as a programming language, I like The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff. I also strongly recommend Hadley Wickham's in-progress book Advanced R.

For people who are beginning to learn statistics, I often recommend Introductory Statistics with R by Peter Dalgaard. For more advanced statistics, I have benefited from Cosma Shalizi's online draft textbook Advanced Data Analysis from an Elementary Point of View.

(I have yet to read and can't comment one way or another on Data Mining with R by Luis Torgo or Machine Learning with R by Brett Lantz.)

Related Question