PCA calculates the eigenvalues that explain most of the variation across the data, in this case it would operate per feature vector and does not take account of class labels.
LDA maximizes Fishers discriminant ratio (or Mahalaobis distance), i.e. it maximizes the distance between classes.
If you define the feature vector for each observation (case) as the data at an instantaneous time point, then the temporal components of the data are not relevant. In this case you can apply PCA as pre-processing stage to each feature vector to reduce dimensionality prior to classification.
If however, you define each trial as a 10s epoch or segment around the point of interest, you could then calculate a summary statistic for each sensor across all time samples in the epoch. Each feature in your feature vector would then be a summary of the behaviour of each sensor over the 10s (e.g. mean amplitude across each 10s epoch). You could then apply PCA as pre-processing step to reduce the dimensionality of the feature vector from 306 to a more manageable number.
This second approach assumes that summary statistics calculated over each 10s epoch contains more information relevant to your problem than the instantaneous feature detailed above.
"PCA chooses the directions in which the variables have the most spread, not the dimensions that have the most relative distances between clustered subclasses."
LDA projects the data so that between class variance : within class variance is maximized. This is accomplished by first projecting in a way that makes the covariance matrix spherical. As this step involves inversion of the covariance matrix, it is numerically unstable if too few observations are available.
So basically the projection you are looking for is made by LDA. However, PCA can help reducing the number of input variates for the LDA, so the matrix inversion is stabilized.
There is an alternative to using PCA for this first projection: PLS. PLS is basically the analogue regression technique to PCA and LDA. Barker, M. and Rayens, W.: Partial least squares for discrimination, Journal of Chemometrics, 2003, 17, 166-173 therefore suggest performing LDA in PLS-scores space.
In practice, you'll find that PLS-LDA needs less latent variables than PCA-LDA.
However, both methods need the number of latent variables specified (otherwise they do not reduce the no of input variates for the LDA). If you can determine this from your knowledge about the problem, go ahead. However, if you determine the number of latent variables from the data (e.g. % variance explained, quality of PLS-LDA model, ...), do not forget to reserve additional data for testing this model (e.g. outer crossvalidation).
Best Answer
LDA on its own can be used to classify, you do not need to use KNN. In LDA you are modeling the data as a set of multivariate normal distributions, with a common covariance matrix $\Sigma$ but different mean vectors $\mu_k$ for $k$ classes. You simply use the estimates of $\Sigma$ and $\mu_k$ to compute log ratios of the density for one class vs another $$d(c_1) = log\frac{P(Class = c_1|X)}{P(Class = c_2|X)} $$ which results in linear discriminant functions thanks to taking the log and the fact that we use the same covariance matrix for all classes. You then classify an observation to whichever discriminant function is highest. You also need estimates for the marginal probabilites $P(Class = c_k)$ which can simply be $\frac{N_i}{N}$ or you can experiment with your own values so long as they sum to 1