[Math] Statistics for mathematicians

booksreference-requestst.statistics

I'm looking for an overview of statistics suitable for the mathematically mature reader: someone familiar with measure theoretic probability at say Billingsley level, but almost completely ignorant of statistics.

Most texts I've come across are either too basic, or are monographs focused on a specific area or technique.

Any suggestions?

Best Answer

This won't directly answer the question, but here are some things a mathematician who wants to learn about statistics should learn:

  • When is a random variable a statistic and when is it not? (A statistic is an observable random variable. For example $X - E(X)$ is not a statistic if the "population average" $E(X)$ is not observable.
  • Fisher's concept of sufficiency. Examples, characterizations, theorems. In particular, the Rao--Blackwell theorem and examples of its use. That's way cool.
  • So is the concept of completeness and the Lehmann--Scheffe theorem.
  • If you think that linear regression is called linear because you're fitting a line, then you are naive. If you're fitting, e.g., a parabola by finding least-squares estimators of three parameters, then you're doing linear regression. There is also such a thing as non-linear regression.
  • Learn the Gauss--Markov theorem on Best Linear Unbiased Estimators (BLUEs).
  • Look at my recent answer to a question on prediction intervals. Why do you need the (finite-dimensional version of) the spectral theorem to understand linear regression? (Look at the aforementioned answer and consider this question an exercise.)
  • As long as we're on linear regression (the topic of the three bullets immediately above this one), look at the Wikipedia article titled "errors and residuals in statistics" (written mostly by me). Learn the difference between an error and a residual. Maybe look at "Studentized residual" as an afterthought.
  • ....and then at "lack-of-fit sum of squares".
  • If you think linear regression is child's play rather than something to which the most brilliant person could devote a long career in research, grow up.
  • Learn the difference between frequentism and Bayesianism. In fact, look at the rant I posted on nLab about this. (The essence of Bayesianism is that probabilities are taken to be epistemic. Bayesianism is not more subjective than frequentism; rather Bayesians and frequentists put their subjectivity in different places. (A really glaring example is the 5% critical value legendarily used in medical journals. Why 5%? Because that's a subjective economic choice.))
  • Learn design of experiments. Learn why Latin squares and a myriad of other combinatorial designs are used.
  • OK, maybe a small and incomplete but nonetheless direct answer to the original question: perhaps Hocking's book on linear models.
  • Learn to use the word "sample" correctly. If you ask the next 100 people you meet whether they intend to vote "yes" or "no", that's not 100 samples; that's one sample.
  • Another thing that will give you some idea of the distinct flavor of the subject, and how it differs from probability theory and some other fields, is books on sampling.
  • Learn about the Wishart distribution.
  • And the multivariate normal distribution.
  • Exercise: How do you prove that every non-negative-definite matrix is the variance of some random vector?
  • Learn why the Behrens--Fisher problem cannot be regarded as a math problem. It belongs up there with Hilbert's problems as one of the great challenges, but it's not mathematics for this reason: One can model it as a math problem in any of a variety of different non-equivalent ways. One can solve those math problems. But which one is the "right" model? That's essentially a philosophical question. And that question, not the math problems, is the Behrens--Fisher problem. (The Behrens--Fisher problem is this: how do you draw inferences about the difference between the means of two normally distributed populations which may have different variances? "Inferences" can mean point-estimates or interval estimates or perhaps other things.)

This is just a sampling of the first things that come immediately to mind. It leans toward showing you what the subject tastes like rather than what it's important to know to do theoretical or applied research.

Statistics is an immensely broader field than mathematical probablity theory.