Let's say I have a $m \times n$ matrix where $m$ is the number of points and $n$ is the number of dimensions. I would like to give a target dimension parameter which is let's say d. d can be a set of values like $\{2,4,6,\ldots,n\}$. I would then approach the problem using Fisher's linear discriminant analysis to give me output matrix in $m \times d$. Is this possible to do? I still don't understand how LDA reduces dimensions from let's say 10,000 to 1,000.
Solved – How to you make linear discriminant analysis reduce dimensions to the number of dimensions you are looking for
dimensionality reductiondiscriminant analysis
Related Solutions
Discriminants are the axes and the latent variables which differentiate the classes most strongly. Number of possible discriminants is $min(k-1,p)$. For example, with k=3 classes in p=2 dimensional space there can exist at most 2 discriminants such as on the graph below. (Note that discriminants are not necessarily orthogonal as axes drawn in the original space, although they, as variables, are uncorrelated.) The centroids of the classes are located within the discriminant subspace according to the their perpendicular coordinates onto the discriminants.
Algebra of LDA at the extraction phase is here.
"Fisher's Discriminant Analysis" is simply LDA in a situation of 2 classes. When there is only 2 classes computations by hand are feasible and the analysis is directly related to Multiple Regression. LDA is the direct extension of Fisher's idea on situation of any number of classes and uses matrix algebra devices (such as eigendecomposition) to compute it. So, the term "Fisher's Discriminant Analysis" can be seen as obsolete today. "Linear Discriminant analysis" should be used instead. See also. Discriminant analysis with 2+ classes (multi-class) is canonical by its algorithm (extracts dicriminants as canonical variates); rare term "Canonical Discriminant Analysis" usually stands simply for (multiclass) LDA therefore (or for LDA + QDA, omnibusly).
Fisher used what was then called "Fisher classification functions" to classify objects after the discriminant function has been computed. Nowadays, a more general Bayes' approach is used within LDA procedure to classify objects.
To your request for explanations of LDA I may send you to these my answers: extraction in LDA, classification in LDA, LDA among related procedures. Also this, this, this questions and answers.
Just like ANOVA requires an assumption of equal variances, LDA requires an assumption of equal variance-covariance matrices (between the input variables) of the classes. This assumption is important for classification stage of the analysis. If the matrices substantially differ, observations will tend to be assigned to the class where variability is greater. To overcome the problem, QDA was invented. QDA is a modification of LDA which allows for the above heterogeneity of classes' covariance matrices.
If you have the heterogeneity (as detected for example by Box's M test) and you don't have QDA at hand, you may still use LDA in the regime of using individual covariance matrices (rather than the pooled matrix) of the discriminants at classification. This partly solves the problem, though less effectively than in QDA, because - as just pointed - these are the matrices between the discriminants and not between the original variables (which matrices differed).
Let me leave analyzing your example data for yourself.
Reply to @zyxue's answer and comments
LDA is what you defined FDA is in your answer. LDA first extracts linear constructs (called discriminants) that maximize the between to within separation, and then uses those to perform (gaussian) classification. If (as you say) LDA were not tied with the task to extract the discriminants LDA would appear to be just a gaussian classifier, no name "LDA" would be needed at all.
It is that classification stage where LDA assumes both normality and variance-covariance homogeneity of classes. The extraction or "dimensionality reduction" stage of LDA assumes linearity and variance-covariance homogeneity, the two assumptions together make "linear separability" feasible. (We use single pooled $S_w$ matrix to produce discriminants which therefore have identity pooled within-class covariance matrix, that give us the right to apply the same set of discriminants to classify to all the classes. If all $S_w$s are same the said within-class covariances are all same, identity; that right to use them becomes absolute.)
Gaussian classifier (the second stage of LDA) uses Bayes rule to assign observations to classes by the discriminants. The same result can be accomplished via so called Fisher linear classification functions which utilizes original features directly. However, Bayes' approach based on discriminants is a little bit general in that it will allow to use separate class discriminant covariance matrices too, in addition to the default way to use one, the pooled one. Also, it will allow to base classification on a subset of discriminants.
When there are only two classes, both stages of LDA can be described together in a single pass because "latents extraction" and "observations classification" reduce then to the same task.
Best Answer
How LDA does dimension reducion has same methodology with PCA. When you get J(W) at LDA (W is result that which minimizes within class scatter matrix but maximizes between class scatter matrix)
When you get W, you can get result with:
before that the trick plays role. As like at PCA, you can get eigenvectors and eigenvalues of W and you can discard some eigenvectors that has smallest corresponding eigenvalues. Then if you multiply your data with that eliminated eigenvector matrix (V) you get a lower dimension.