Solved – What learning occurs in linear discriminant analysis

discriminant analysismachine learningoptimization

From what I understand, linear discriminant analysis (LDA) has an objective function, where you try to find a matrix that maps data from a $p$-dimensional feature space to a $r$-dimensional feature space with $r<p$.

Per my understanding, an objective function implies that there is learning being done. I define learning as a process by which an optimized solution is found, over several iterations, as in neural machine learners such as backpropagation, etc.

I'm having a hard time understanding if/how LDA "learns". You calculate the between, within and total scatter – fine. But what do you do with them? How do you find that matrix, $G$, that maps your data to a lower dimensional feature space?

I've been looking at an implementation of LDA here, and I do not understand how "pooled covariance" and "W" relates to any of the definitions from the mathematics described in the image below (from a paper by Wang, Ding & Huang, 2010). Can anyone help me? How do I find $G$ in formula (5)? Where is the optimization occurring in the code implementation attached?

enter image description here

Update: This was very helpful to me.

Best Answer

The paper you're reading is describing Fisher's linear discriminant and the MATLAB code is actually implementing LDA that assumes a multivariate normal distribution.

Take a look at this link for a more thorough description but mainly the part that is confusing you ($\vec{G}$) is calculated here:

Temp = GroupMean(i,:) / PooledCov;

% Constant
W(i,1) = -0.5 * Temp * GroupMean(i,:)' + log(PriorProb(i));

and corresponds to the fairly standard maximum likelihood estimation of the multivariate normal (page 7 in the slides).

Just to be clear. Fisher's linear discriminant and LDA are equivalent (assuming LDA's assumptions are satisfied) in that both will give you the same projection.

UPDATE: Actually, Wikipedia offers an overview of both approaches.