Solved – Fisher Distance for feature selection

discriminant analysisfeature selection

I'm currently working for EEG signal classification from 3 electrodes. I want to have a simple feature selection algorithm that is independent with the classification process. From the feature extraction step, let's say I have this kind of matrix now (not the actual numbers/data) :

A CLASS :
$$
Ch_1 = \begin{vmatrix}
1 & 2 & 3 \\
0.5 & 0.2 & 0 \\
1 & 0.1 & 0.8 \\
1.2 & 0.8 & 1
\end{vmatrix}
Ch_2 = \begin{vmatrix}
1 & 1.5 & 1 \\
0.3 & 0.1 & 2 \\
1.3 & 0.1 & 3 \\
1.5 & 1.8 & 2
\end{vmatrix}
Ch_3 = \begin{vmatrix}
2 & 2 & 3 \\
1.2 & 2 & 0.8 \\
1.3 & 1.2 & 1.5 \\
1.8 & 3 & 2
\end{vmatrix}
$$
B CLASS :
$$
Ch_1 = \begin{vmatrix}
1 & 2 & 3 \\
0.5 & 0.2 & 0 \\
0.1 & 2 & 0 \\
1.2 & 0.8 & 1
\end{vmatrix}
Ch_2 = \begin{vmatrix}
1.2 & 1.5 & 1 \\
0.3 & 0.1 & 2 \\
0.8 & 1.1 & 0 \\
1.5 & 1.8 & 2
\end{vmatrix}
Ch_3 = \begin{vmatrix}
2 & 2 & 3 \\
1.2 & 2 & 0.8 \\
0.2 & 1 & 0.3 \\
1.8 & 3 & 1
\end{vmatrix}
$$

Where on the example above, the row of the channels are the numbers of trials/observations (4 trials per class) and the column are the features extracted from each sub-band (3 features).
What I want to do is selecting which feature will give me better separation of data between classes, while maintaining close relationship within its own class.
I am trying to do Fisher Distance approach :
$$
FisherDis = S_B/S_w $$
Where $S_B$ is between class matrix and $S_w$ is within class matrix. From what I read, I have to score each feature and then select some features with highest scores.

Now to my question:
1. What is "the number of samples" when I want to calculate $S_w$ and $S_B$ , is it four (as in four trials) or three (as in three features) ?
2. Should I group the channels into one matrix? Or is it better if I'm working in each channel separately?
3. Am I working on the right path? I have doubts in myself…

Thank you very much in advance. I'd appreciate every answer from everyone because I'm fairly new to statistics (I have so much to learn..) 🙂

Best Answer

  1. 'N' or number of samples is usually the number of cases, this can be the number of subject (assuming you have one measurement of each feature per subject) or the number of measurements per feature. If you have multiple measurements per feature, per subject you will need to account for this.
  2. Generally, EEG channels should be considered separately, as different activity patterns can be observed in different locations, i.e. occipital alpha activity may have different characteristics to frontal alpha for a given subject. For this reason I would recommend calculating each feature separately for each channel. IF it is justifiable for the application you can average each feature across all channels for each subject.
  3. One solution could be to take the mean value of each feature across all four trials, calculated for each of the 3 channels (electrodes) giving 9 features for each subject.

You can use hypothesis testing such as a t-test or rank-sum test to examine how well each feature distinguishes between the two classes. However, if you are performing classification, feature selection should be performed within each fold of cross-validation.