Solved – Is graduate level probability theory (Durett) used often in ML, DL research

deep learningmachine learningmathematical-statisticsprobability

I am interested in machine learning; I have a particular liking for RNNs. I have coursework in some areas of computer science, e.g., data mining, optimization for ML algorithms, deep learning, and an undergraduate degree in mathematics. I have been applying ML algorithms for a while. I have some coursework in basic probability theory; enough to understand the sections in Goodfellow's Deep Learning book. I understand how most cost functions are related to probability density functions.

I was wondering; I am interested in deep learning with medical applications. Should I take a graduate level statistics course in Probability Theory that follows Durrett's textbook, or should I, for example, dive into Deep Learning papers and textbooks.

I was wondering if anyone could tell me what to do or, because this might be closed with such an opinion-based request, provide examples of how probability theory was (a) absolutely necessary in their own study (either making new algorithms or reading how others did) of machine or Deep Learning or (b) seemed to kind of be extra (for example, I feel this way to some extent about real analysis now; I took it a long time ago and although it increased my mathematical maturity, it is calculus I use constantly to backpropagate errors, not analysis, for example; it is possible to conceptualize and even develop learning algorithms only having intuitions about calculus and not having riogrous proofs such as those found in analysis). Don't get me wrong, I enjoyed analysis and would love to take all the courses I can, but my funding is eventually going to run out and I am expected to go into depth into a subject.

For example, I recently attended a conference with a very interesting presentation on RNNs that were using as input samples from Gaussian Processes used to model the missing data. It was designed by a statistician who presumably had Probability Theory. I would also like to design such pipelines. Btw, I am certainly taking graduate level Statistical Inference (this has been enthusiastically recommended twice by friends in statistics, while graduate level probability theory has been either dis-recommended or recommended without enthusiasm).

I hope this is specific enough, else happy to migrate to academia stack exchange.

Best Answer

As a statistics PhD student studying Bayesian deep learning and Gaussian processes, I have found it useful to be familiar with probability. I do not directly use the results for now because I am working on applied problems, but a lot of the theoretical work I look at is based on nonparametric techniques such as Gaussian or Dirichlet processes and those techniques are shown to have some properties using probability and functional analysis. Look for textbooks by Aad van der Vaart if you'd like an (extreme) example.

In other words, knowing probability theory opens up a lot of statistical literature for you to peruse.

If you care more about classification rates and less about uncertainty quantification, it might not be worth your time to get into the statistics. But then, if you are doing a logistic regression and the prediction of cancer for subject X is "1" (vs "0"), you might want to know how much confidence to place in that classification.