1. LDA or logistic regression?
LDA and logistic regression can both be used to 'predict' the class of a subject, both can handle the case of more than two classes.
They both differ in the way of solving the classification problem and therefore they make different assumptions: logistic regression assumes the well-known S-shape, while LDA assumes that in each class your data are (1) multivariate normal and (2) with the same var-covar matrix in each class.
If the assumptions of multivariate normality and same var-covar are fulfilled then, in general, LDA will perform better.
2. Intuition behind LDA
You have 'subjects' that are characterized by features $x_1, x_2, \dots x_n$. The goal is to decide on the class of the subject, knowing the value of its features.
As said, LDA assumes that, in each class '$c$', your features have a multivariate distribution with a mean that depends on the class, so $\mu_c$ (note that this is a vector) and var-covar $\Sigma$ ( the same for all the classes), so for each class we know the multivariate density $\Phi_c(\mu_c,\Sigma)$ that allows us to calculate the probabilities.
Now, given the features, we can compute the $\Phi_c$ for the featurs $x_i$ and we will put the subject in that class $c$ where this yields the highest value (i.e. Where the 'probability' is heighest).
3. Why is LDA a dimension reduction technique?
LDA assumes multivariate normality in each class with the same var-covar. Therefore the classes are all the 'same' except for their mean. So you can 'feel' that the number of means will be important.
If you have $n$ features, then, in the end, the solution will depend on $C$ means, $C$ being the number of classes. In fact it can be show mathematically that LDA 'solves' the classification problem in a 'subspace' of the $n$-dimensional feature space, and that this subspace has dimension that is lower than $C$.
To make it more concrete, assume that you have subjects with $25$ features and you want to classify them in two classes, then you can 'solve' the problem in a one-dimesnional space (thus on a line). This is why LDA is said to be a dimension reduction technique, in this case it reduces the dimension from $25$ to one .
If you look at the calculations, you will see there are a few bugs in this.
Correct value of w comes out to be :
$$w = C-a(x-\mu_{10})^2+p(x-\mu_{00})^2+b\mu_{11}x+c\mu_{11}x-q\mu_{01}x-r\mu_{01}x-d\mu_{11}^2+s\mu_{01}^2-b\mu_{10}\mu_{11}-c\mu_{10}\mu_{11}+q\mu_{01}\mu_{00}+r\mu_{01}\mu_{00}
$$
After then the value of y comes out to be:
$$
y = \frac{-v\pm\sqrt{v^2+4uw}}{2u}
$$
After making these two changes, you will get the correct quadratic boundary.
Best Answer
The discriminant axis (the onto which the points are projected on your Figure 1) is given by the first eigenvector of $\mathbf{W}^{-1}\mathbf{B}$. In case of only two classes this eigenvector is proportional to $\mathbf{W}^{-1}(\mathbf{m}_1-\mathbf{m}_2)$, where $\mathbf{m}_i$ are class centroids. Normalize this vector (or the obtained eigenvector) to get the unit axis vector $\mathbf{v}$. This is enough to draw the axis.
To project the (centred) points onto this axis, you simply compute $\mathbf{X}\mathbf{v}\mathbf{v}^\top$. Here $\mathbf{v}\mathbf{v}^\top$ is a linear projector onto $\mathbf{v}$.
Here is the data sample from your dropbox and the LDA projection:
Here is MATLAB code to produce this figure (as requested):