Solved – Mathematician wants the equivalent knowledge to a quality stats degree

careersreferences

I know people love to close duplicates so I am not asking for a reference to start learning statistics (as here).

I have a doctorate in mathematics but never learned statistics. What is the shortest route to the equivalent knowledge to a top notch BS statistics degree and how do I measure when I have achieved that.

If a list of books would suffice (assuming I do the exercises lets say), that's terrific. Yes, I expect working out problems to be an implicit part of learning it but I want to fast track as much as realistically possible. I am not looking for an insanely rigorous treatment unless that is part of what statistical majors generally learn.

Best Answer

(Very) short story

Long story short, in some sense, statistics is like any other technical field: There is no fast track.

Long story

Bachelor's degree programs in statistics are relatively rare in the U.S. One reason I believe this is true is that it is quite hard to pack all that is necessary to learn statistics well into an undergraduate curriculum. This holds particularly true at universities that have significant general-education requirements.

Developing the necessary skills (mathematical, computational, and intuitive) takes a lot of effort and time. Statistics can begin to be understood at a fairly decent "operational" level once the student has mastered calculus and a decent amount of linear and matrix algebra. However, any applied statistician knows that it is quite easy to find oneself in territory that doesn't conform to a cookie-cutter or recipe-based approach to statistics. To really understand what is going on beneath the surface requires as a prerequisite mathematical and, in today's world, computational maturity that are only really attainable in the later years of undergraduate training. This is one reason that true statistical training mostly starts at the M.S. level in the U.S. (India, with their dedicated ISI is a little different story. A similar argument might be made for some Canadian-based education. I'm not familiar enough with European-based or Russian-based undergraduate statistics education to have an informed opinion.)

Nearly any (interesting) job would require an M.S. level education and the really interesting (in my opinion) jobs essentially require a doctorate-level education.

Seeing as you have a doctorate in mathematics, though we don't know in what area, here are my suggestions for something closer to an M.S.-level education. I include some parenthetical remarks to explain the choices.

  1. D. Huff, How to Lie with Statistics. (Very quick, easy read. Shows many of the conceptual ideas and pitfalls, in particular, in presenting statistics to the layman.)
  2. Mood, Graybill, and Boes, Introduction to the Theory of Statistics, 3rd ed., 1974. (M.S.-level intro to theoretical statistics. You'll learn about sampling distributions, point estimation and hypothesis testing in a classical, frequentist framework. My opinion is that this is generally better, and a bit more advanced, than modern counterparts such as Casella & Berger or Rice.)
  3. Seber & Lee, Linear Regression Analysis, 2nd ed. (Lays out the theory behind point estimation and hypothesis testing for linear models, which is probably the most important topic to understand in applied statistics. Since you probably have a good linear algebra background, you should immediately be able to understand what is going on geometrically, which provides a lot of intuition. Also has good information related to assessment issues in model selection, departures from assumptions, prediction, and robust versions of linear models.)
  4. Hastie, Tibshirani, and Friedman, Elements of Statistical Learning, 2nd ed., 2009. (This book has a much more applied feeling than the last and broadly covers lots of modern machine-learning topics. The major contribution here is in providing statistical interpretations of many machine-learning ideas, which pays off particularly in quantifying uncertainty in such models. This is something that tends to go un(der)addressed in typical machine-learning books. Legally available for free here.)
  5. A. Agresti, Categorical Data Analysis, 2nd ed. (Good presentation of how to deal with discrete data in a statistical framework. Good theory and good practical examples. Perhaps on the traditional side in some respects.)
  6. Boyd & Vandenberghe, Convex Optimization. (Many of the most popular modern statistical estimation and hypothesis-testing problems can be formulated as convex optimization problems. This also goes for numerous machine-learning techniques, e.g., SVMs. Having a broader understanding and the ability to recognize such problems as convex programs is quite valuable, I think. Legally available for free here.)
  7. Efron & Tibshirani, An Introduction to the Bootstrap. (You ought to at least be familiar with the bootstrap and related techniques. For a textbook, it's a quick and easy read.)
  8. J. Liu, Monte Carlo Strategies in Scientific Computing or P. Glasserman, Monte Carlo Methods in Financial Engineering. (The latter sounds very directed to a particular application area, but I think it'll give a good overview and practical examples of all the most important techniques. Financial engineering applications have driven a fair amount of Monte Carlo research over the last decade or so.)
  9. E. Tufte, The Visual Display of Quantitative Information. (Good visualization and presentation of data is [highly] underrated, even by statisticians.)
  10. J. Tukey, Exploratory Data Analysis. (Standard. Oldie, but goodie. Some might say outdated, but still worth having a look at.)

Complements

Here are some other books, mostly of a little more advanced, theoretical and/or auxiliary nature, that are helpful.

  1. F. A. Graybill, Theory and Application of the Linear Model. (Old fashioned, terrible typesetting, but covers all the same ground of Seber & Lee, and more. I say old-fashioned because more modern treatments would probably tend to use the SVD to unify and simplify a lot of the techniques and proofs.)
  2. F. A. Graybill, Matrices with Applications in Statistics. (Companion text to the above. A wealth of good matrix algebra results useful to statistics here. Great desk reference.)
  3. Devroye, Gyorfi, and Lugosi, A Probabilistic Theory of Pattern Recognition. (Rigorous and theoretical text on quantifying performance in classification problems.)
  4. Brockwell & Davis, Time Series: Theory and Methods. (Classical time-series analysis. Theoretical treatment. For more applied ones, Box, Jenkins & Reinsel or Ruey Tsay's texts are decent.)
  5. Motwani and Raghavan, Randomized Algorithms. (Probabilistic methods and analysis for computational algorithms.)
  6. D. Williams, Probability and Martingales and/or R. Durrett, Probability: Theory and Examples. (In case you've seen measure theory, say, at the level of D. L. Cohn, but maybe not probability theory. Both are good for getting quickly up to speed if you already know measure theory.)
  7. F. Harrell, Regression Modeling Strategies. (Not as good as Elements of Statistical Learning [ESL], but has a different, and interesting, take on things. Covers more "traditional" applied statistics topics than does ESL and so worth knowing about, for sure.)

More Advanced (Doctorate-Level) Texts

  1. Lehmann and Casella, Theory of Point Estimation. (PhD-level treatment of point estimation. Part of the challenge of this book is reading it and figuring out what is a typo and what is not. When you see yourself recognizing them quickly, you'll know you understand. There's plenty of practice of this type in there, especially if you dive into the problems.)

  2. Lehmann and Romano, Testing Statistical Hypotheses. (PhD-level treatment of hypothesis testing. Not as many typos as TPE above.)

  3. A. van der Vaart, Asymptotic Statistics. (A beautiful book on the asymptotic theory of statistics with good hints on application areas. Not an applied book though. My only quibble is that some rather bizarre notation is used and details are at times brushed under the rug.)

Related Question