Solved – the correct formula for covariance matrix in quadratic discriminant analysis (QDA)

discriminant analysis

I know that in quadratic discriminant analysis (QDA) we use the variance of each class, so is the formula different than that in linear discriminant analysis (LDA)?

Is it $$\frac{1}{N-K} \sum (x – \mu)(x – \mu)^T$$ or $$\frac{1}{N} \sum (x – \mu)(x – \mu)^T,$$

and how can I generate a quadratic boundary equation from this?

Best Answer

In a scenario with $N$ samples and $K$ classes or labels, The first formula should be

$$\frac{1}{N-K} \sum_{c=1}^K \sum_{y_i = c} (x_i - \hat \mu_c) (x_i - \hat \mu_c)^\intercal$$

and is for calculating the pooled variance, to be used if you're tying the covariance matrix across classes (as in LDA). The $N-K$ term arises from Bessel's correction.

If you're not tying the covariance matrices (as in QDA), then the covariance matrix for a class $c$ with $N_c$ samples is

$$\frac{1}{N_c - 1} \sum_{y_i = c} (x_i - \hat \mu_c) (x_i - \hat \mu_c)^\intercal$$

if you want an unbiased estimate of the variance, or

$$\frac{1}{N_c} \sum_{y_i = c} (x_i - \hat \mu_c) (x_i - \hat \mu_c)^\intercal$$

if you want an MSE estimate of the variance.

Either way, usually you don't calculate the equation of the decision boundary in QDA. Given a test point you just evaluate the posterior probability of each class, and pick the highest.

Related Solutions

Solved – Quadratic discriminant analysis (QDA) with qualitative predictors in R

My answer would be no you cannot use the usual form of LDA or QDA if your data points are binary. The reason for this is the following:

LDA and QDA require you to estimate several parameters of a multivariate gaussian distribution: the mean(s), the covariance matrix
How would you fit a multivariate gaussian on data that can only be equal to 0 or 1 ?

In his lecture notes, Andrew Ng specifically states that the assumption behind LDA is that the X data is continuous

See here:

Solved – Why do we not look at the covariance matrix when choosing between LDA or QDA

The question is quite old. I'm surprised that there's no attempt to answer. So, I will try)

I wonder why I have not seen an example yet where they actually calculate the variance covariance matrices in each class and compare them with each other?

Well, theoretically you are completely right. But in practice, there are few thoughts why you see it very rarely:

The LDA method is quite robust for unsufficient violations of assumptions of distribution normality and covariance matrix equality.
In contrast, using a more complicated model leads to overfitting. I.e. using QDA, where it's not necessary and LDA works well, may lead to an overfit.
You may check the assumptions using some hypothesis test, but... the test has its own assumptions, so it becomes like a vicious circle. For example, take a look here:

Box’s M Test is extremely sensitive to departures from normality; the fundamental test assumption is that your data is multivariate normally distributed. Therefore, if your samples don’t meet the assumption of normality, you shouldn’t use this test.

So, how to know? Well, my personal recommendation: think about the physical meaning of your data. What are the groups you are investigating? Is there a high possibility (or some reason) of a significant difference in groups' covariance?

Example 1 Let's see on the iris dataset.

Check Box'M test

> biotools::boxM(iris[,-5], iris[,5])

   Box's M-test for Homogeneity of Covariance Matrices

data:  iris[, -5]
Chi-Sq (approx.) = 140.94, df = 20, p-value < 2.2e-16

The null hypothesis of covariance equality is rejected. I.e. the assumption is violated.

But let's take a look at the data:

library(ggplot2)
x <- iris[,-5]
y <- iris[,5]
pca <- prcomp(x, center = T, scale. = F)
qplot(x=pca$x[,1], y=pca$x[,2], color=y, shape=y, xlab = 'PC 1', ylab='PC 2', size=4)

We visually see that variations are not the same in the three groups. But do they differ significantly? Doesn't seem such. Moreover, let's think about what is the data we are analyzing, i.e. petals of close species of flowers. So, it's quite natural to assume that covariance in groups is some-what the same.

Compare LDA and QDA

library(klaR)
partimat(Species ~ ., data = iris, method = "lda", plot.matrix = TRUE, imageplot = T, col.correct='green', col.wrong='red', cex=1)
partimat(Species ~ ., data = iris, method = "qda", plot.matrix = TRUE, imageplot = T, col.correct='green', col.wrong='red', cex=1)

As one can see, the separation quality is almost the same for LDA and QDA. But the QDA model seems overfitted because of unnecessary complex borders.

Example 2 Let's simulate more significant violation of the covariance assumption. Multiplying values of one group by 2 will increase the variance by 4 times.

    x2 <- iris[51:150,-5]
    y2 <- factor(iris[51:150,5])
    x2[y2 == 'versicolor',] <- 2*x2[y2 == 'versicolor',]
    pca <- prcomp(x2, center = T, scale. = F)
    qplot(x=pca$x[,1], y=pca$x[,2], color=y2, shape=y2, xlab = 'PC 1', ylab='PC 2', size=4)

Theoretically, we would reject LDA and use QDA. But by projection on PCA we see that the two classes can be separated linearly.

    iris2 <- iris[51:150,]; iris2$Species <- factor(iris2$Species); iris2[51:100,-5] <- 2*iris2[51:100,-5]
    partimat(Species ~ ., data = iris2, method = "lda", plot.matrix = TRUE, imageplot = T, col.correct='green', col.wrong='red', cex=1)
    partimat(Species ~ ., data = iris2, method = "qda", plot.matrix = TRUE, imageplot = T, col.correct='green', col.wrong='red', cex=1)

We see that both models work well despite the assumption violation. But we would choose linear model since it is more interpretable and less probably overfitted.

Example 3 Now suppose we want to discriminate only Versicolor, i.e. our target now is consists of two classes: Versicolor and not Versicolor. Now it's reasonable to assume that covariances are different because covariance of not Versicolor group contains between group variance as well. Normality assumption is violated significantly too.

    y3 <- as.character(y); y3[y3 != 'versicolor'] <- 'not versicolor';y3 <- factor(y3)
    qplot(x=pca$x[,1], y=pca$x[,2], color=y3, shape=y, xlab = 'PC 1', ylab='PC 2', size=4)

Compare the variance on red points and blue points. The difference is quite significant now, right?. Theoretically, we would refuse both QDA and LDA since there is no normality and equal covariance as well. But let's see how methods would work:

    iris3 <- iris; iris3$Species <- y3
    partimat(Species ~ ., data = iris3, method = "lda", plot.matrix = TRUE, imageplot = T, col.correct='green', col.wrong='red', cex=1)
    partimat(Species ~ ., data = iris3, method = "qda", plot.matrix = TRUE, imageplot = T, col.correct='green', col.wrong='red', cex=1)

    > l <- lda(Species ~ ., data = iris3)
    > table(y3, predict(l)$class)

    y3               not versicolor versicolor
    not versicolor             86         14
    versicolor                 26         24

    > q <- qda(Species ~ ., data = iris3)
    > table(y3, predict(q)$class)

    y3               not versicolor versicolor
    not versicolor             99          1
    versicolor                  4         46

Well, LDA indeed is not suitable. However, QDA model is not that bad.

So in general, I would recommend:

Think about your data and task. Is there a reason for a significant violation of the assumptions?
Take a look at PCA projections. If first 2-3 components explain an essential part of variance (let's say, more than 90%). By scores-plots, you may see normality of the distribution, equality of the within-group covariances, and estimate how linear discrimination would work.
Try both LDA and QDA. If the difference is not essential, stick with LDA as QDA most probably is overfitted.

P.S.

Or maybe you can use the pairs plot for this (ref below)? Then how do I read it?

I hope it's clear from my explanation, that pairs is a wrong tool. pairs shows only feature-pairwise variance, but the assumption is about covariance within groups, i.e. for iris we assume that cov(iris[iris$Species == 'setosa', -5]), cov(iris[iris$Species == 'versicolor', -5]), and cov(iris[iris$Species == 'virginica', -5]) are equal.

Best Answer

Related Solutions

Solved – Quadratic discriminant analysis (QDA) with qualitative predictors in R

Solved – Why do we not look at the covariance matrix when choosing between LDA or QDA

So in general, I would recommend:

Related Question