I am working on the Forest type mapping dataset which is available in the UCI machine learning repository.
I have 27 features to predict the 4 types of forest. I am performing a Linear Discriminant Analysis (LDA) to reduce the number of features using lda()
function available in the MASS
library.
The results of my LDA are as follows –
forest.lda <- lda(class ~.,data=forest)
forest.lda
Call:
lda(class ~ ., data = forest)
Prior probabilities of groups:
d h o s
0.2727273 0.2424242 0.1868687 0.2979798
Group means:
b1 b2 b3 b4 b5 b6 b7 b8 b9
d 53.31481 48.31481 68.40741 97.66667 63.18519 103.31481 98.74074 26.07407 56.46296
h 78.27083 29.39583 55.14583 113.66667 50.22917 95.25000 98.00000 25.10417 60.06250
o 65.08108 66.54054 87.97297 103.62162 77.70270 118.10811 91.56757 44.18919 77.94595
s 57.96610 27.79661 51.05085 93.47458 49.67797 91.66102 76.52542 24.28814 55.67797
pred_minus_obs_H_b1 pred_minus_obs_H_b2 pred_minus_obs_H_b3 pred_minus_obs_H_b4
d 60.76204 2.723704 28.25963 0.7018519
h 35.17708 21.101250 40.25167 -15.9608333
o 48.90054 -15.450270 8.99027 -7.0445946
s 55.64695 22.945254 44.95712 3.6745763
pred_minus_obs_H_b5 pred_minus_obs_H_b6 pred_minus_obs_H_b7 pred_minus_obs_H_b8
d -37.75296 -42.72981 -21.66481481 3.772407
h -25.02354 -35.37042 -21.25583333 4.638958
o -52.34162 -57.74297 -15.35891892 -14.684324
s -24.41831 -31.63898 0.06610169 5.355085
pred_minus_obs_H_b9 pred_minus_obs_S_b1 pred_minus_obs_S_b2 pred_minus_obs_S_b3
d -0.6970370 -20.12778 -1.1394444 -4.508519
h -4.5602083 -19.90458 -0.8512500 -4.302500
o -22.5089189 -19.82000 -0.9532432 -4.145946
s -0.3098305 -20.19966 -1.0466102 -4.390508
pred_minus_obs_S_b4 pred_minus_obs_S_b5 pred_minus_obs_S_b6 pred_minus_obs_S_b7
d -21.34370 -0.9635185 -4.722407 -19.31796
h -21.09187 -1.0172917 -4.687083 -18.68271
o -20.55324 -0.8581081 -4.273514 -18.23081
s -20.88051 -1.0201695 -4.613898 -18.91254
pred_minus_obs_S_b8 pred_minus_obs_S_b9
d -1.860926 -4.510000
h -1.345000 -3.968542
o -1.442432 -4.048108
s -1.569492 -4.051695
Coefficients of linear discriminants:
LD1 LD2 LD3
b1 0.15410941 0.083191904 -0.05574989
b2 -0.09033693 -0.098788690 -0.05294103
b3 0.01730795 0.034581992 0.17366457
b4 0.03142488 0.086065613 0.15504752
b5 1.01223928 0.520003333 0.41695043
b6 -0.40178858 -0.526621626 -0.48149395
b7 -0.23331487 -0.185079579 -0.05010620
b8 -0.32959040 -0.218615144 -0.10788013
b9 0.13938043 0.179366235 0.15483732
pred_minus_obs_H_b1 -0.02732135 0.013629388 -0.02773200
pred_minus_obs_H_b2 0.16743148 -0.086326071 -0.27561332
pred_minus_obs_H_b3 -0.11530638 0.044265889 0.35503689
pred_minus_obs_H_b4 0.05370740 0.068117979 0.13440809
pred_minus_obs_H_b5 1.03718236 0.462674531 0.38562055
pred_minus_obs_H_b6 -0.40022794 -0.541066832 -0.32075713
pred_minus_obs_H_b7 -0.19480130 -0.138378132 0.06374322
pred_minus_obs_H_b8 -0.13609236 -0.003440928 0.09291261
pred_minus_obs_H_b9 0.01171503 -0.160364856 -0.11694535
pred_minus_obs_S_b1 -0.05050788 0.019637786 -0.03832580
pred_minus_obs_S_b2 0.05946038 0.023019484 -0.06508984
pred_minus_obs_S_b3 -0.05777119 -0.138126136 0.07659433
pred_minus_obs_S_b4 0.03461031 0.008094415 0.06487418
pred_minus_obs_S_b5 0.63346845 0.105556436 -0.01382360
pred_minus_obs_S_b6 -0.30309468 -0.109369091 0.07915327
pred_minus_obs_S_b7 -0.07614580 -0.053089078 0.01138394
pred_minus_obs_S_b8 -0.13416272 0.328494630 -0.44248108
pred_minus_obs_S_b9 0.06414547 -0.347941228 0.43430498
Proportion of trace:
LD1 LD2 LD3
0.7365 0.1971 0.0664
From wiki and other links what I understand is LD1, LD2 and LD3 are functions that I can use to classify the new data (LD1 73.7% and LD2 19.7%). I am not able to interpret how I can use this result to reduce the number of features or select only the relevant features as LD1 and LD2 functions have coefficient for each feature.
I am looking for help on interpreting the results to reduce the number of features from $27$ to some $x<27$.
Best Answer
If it doesn't need to be vanilla LDA (which is not supposed to select from input features), there's e.g. Sparse Discriminant Analysis, which is a LASSO penalized LDA: Line Clemmensen, Trevor Hastie, Daniela Witten, Bjarne Ersbøll: Sparse Discriminant Analysis (2011)
This uses a discrete subset of the input features via the LASSO regularization.