Solved – Theory on discriminant analysis in small sample size conditions

discriminant analysismathematical-statisticssmall-sample

I see a similarity between a problem I'm working on and Linear (or Quadratic) Discriminant Analysis when the sample size is smaller than $p+1$.

I'm interested in theory bounding the generalization error of either classifier when the data can be assumed to be dense (i.e. all of the data parameters are potentially informative) and the estimated class covariance(s) are singular (and so one has to regularize to get a working classifier).

For example it seems to me that, in this setting, pseudo-inversion should (in the generalization error sense) be the worst possible thing to do, as one discards potentially useful discriminating information in the null space of the covariance matrix. Making this intuition more concrete doesn't seem straightforward though.

Searches on Scholar using likely-looking strings e.g. "discriminant analysis" AND "small sample size" return thousands of papers, largely from the face recognition literature and, as far as I can see, propose different regularization schemes or LDA/QDA variants.

My interest is in any guarantees on the performance of such classifiers in the small sample size case, given that there are plenty of alternative ways in which one can solve the numerical issues. Can anyone point me, please, to some relevant theory?

Best Answer

I found the following: Raudys and Duin: Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix.

A wider search, looking at covariance estimation more generally, has also been fruitful. Under assumptions on the data distribution the following papers, found from the references in Vershynin's paper, look interesting:

Adamczak et al.: Quantitative estimates of the convergence of the empirical covariance matrix in Log-concave Ensembles

Adamczak et al.: Sharp bounds on the rate of convergence of the empirical covariance matrix

Rudelson.: Random vectors in the isotropic position

Vershynin.: How close is the sample covariance matrix to the actual covariance matrix?

Best Answer

Related Solutions

Discriminant Analysis – The Discriminant Function in Linear Discriminant Analysis

Deriving the discriminant function for LDA

Related Question