Solved – What to learn after Casella & Berger

distributionsexploratory-data-analysisreferences

I am a pure math grad student with little background in applied mathematics. Since last fall I have been taking classes on Casella & Berger's book, and I have finished hundreds (230+) of pages of exercise problems in the book. Right now I am at Chapter 10.

However, since I have not majored in statistics or planned to be a statistician, I do not think I will be able to invest time regularly to continue learning data analysis. My experience so far is telling me that, to be a statistician, one needs to bear with a lot of tedious computation involving various distributions (Weibull, Cauchy, $t$, $F$…). I found while the fundamental ideas are simple, the implementation (for example the LRT in hypothesis testing) can still be difficult due to technicalities.

Is my understanding correct? Is there a way I can learn probability & statistics that not only covers more advanced material, but can also help in case I need data analysis in real life? Will I need to spend $\ge$20 hrs per week on it like I used to?

While I believe there is no royal road in learning mathematics, I often cannot help wondering – most of the time we do not know what the distribution is for real life data, so what is the purpose for us to focus exclusively on various families of distributions? If the sample size is small and the central limit theorem does not apply, how can we properly analyze the data besides the sample average and variance if the distribution is unknown?

My semester will end in a month, and I do not want my knowledge to evaporate after I start to focus on my PhD research. So I decided to ask. I am learning R, and I have some programming background, but my level is about the same as a code monkey.

Best Answer

I do not think I will be able to give regular time investment to continue learning data analysis

I don't think Casella & Berger is a place to learn data much in the way of data analysis. It's a place to learn some of the tools of statistical theory.

My experience so far telling me to be a statistican one needs to bear with a lot of tedious computation involving various distributions(Weibull, Cauchy, t, F...).

I've spent a lot of time as a statistician doing data analysis. It rarely (almost never) involves me doing tedious calculation. It sometimes involves a little simple algebra, but the common problems are usually solved and I don't need to expend any effort on replicating that each time.

The computer does all the tedious calculation.

If I am in a situation where I'm not prepared to assume a reasonably standard case (e.g. not prepared to use a GLM), I generally don't have enough information to assume any other distribution either, so the question of the calculations in LRT is usually moot (I can do them when I need to, they just either tend to be already solved or come up so rarely that it's an interesting diversion).

I tend to do a lot of simulation; I also frequently try to use resampling in some form either alongside or in place of parametric assumptions.

Will I need to spend 20hr+ per week on it like I used to be?

It depends on what you want to be able to do and how soon you want to get good at it.

Data analysis is a skill, and it takes practice and a large base of knowledge. You'll have some of the knowledge you need already.

If you want to be a good practitioner at a wide variety of things, it will take a lot of time - but to my mind it's a lot more fun than the algebra and such of doing Casella and Berger exercises.

Some of the skills I built up on say regression problems are helpful with time series, say -- but a lot of new skills are needed. So learning to interpret residual plots and QQ plots is handy, but they don't tell me how much I need to worry about a little bump in a PACF plot and don't give me tools like the use of one-step-ahead prediction errors.

So for example, I don't need to expend effort figuring out how to do reasonably ML for typical gamma or weibull models, because they're standard enough to be solved problems that have already been largely put into a convenient form.

If you come to do research, you'll need a lot more of the skills you pick up in places like Casella & Berger (but even with those kind of skills, you should also read more than one book).


Some suggested things:

You should definitely build up some regression skills, even if you do nothing else.

There are a number of quite good books, but perhaps Draper & Smith Applied Regression Analysis plus Fox and Weisberg An R Companion to Applied Regression; I'd also suggest you consider following with Harrell's Regression Modelling Strategies

(You could substitute any number of good books for Draper and Smith - find one or two that suit you.)

The second book has a number of online additional chapters that are very much worth reading (and its own R-package)

--

A good second serving would be Venables & Ripley's Modern Applied Statistics with S.

That's some grounding in a fairly broad swathe of ideas.

It may turn out that you need some more basic material in some topics (I don't know your background).

Then you'd need to start thinking about what areas of statistics you want/need -- Bayesian stats, time series, multivariate analysis, etc etc