Solved – Theory on discriminant analysis in small sample size conditions

discriminant analysismathematical-statisticssmall-sample

I see a similarity between a problem I'm working on and Linear (or Quadratic) Discriminant Analysis when the sample size is smaller than $p+1$.

I'm interested in theory bounding the generalization error of either classifier when the data can be assumed to be dense (i.e. all of the data parameters are potentially informative) and the estimated class covariance(s) are singular (and so one has to regularize to get a working classifier).

For example it seems to me that, in this setting, pseudo-inversion should (in the generalization error sense) be the worst possible thing to do, as one discards potentially useful discriminating information in the null space of the covariance matrix. Making this intuition more concrete doesn't seem straightforward though.

Searches on Scholar using likely-looking strings e.g. "discriminant analysis" AND "small sample size" return thousands of papers, largely from the face recognition literature and, as far as I can see, propose different regularization schemes or LDA/QDA variants.

My interest is in any guarantees on the performance of such classifiers in the small sample size case, given that there are plenty of alternative ways in which one can solve the numerical issues. Can anyone point me, please, to some relevant theory?

Related Question