Solved – Linearly dependent features

linear algebramachine learningMATLABmatrix

I have a matrix A of 1000 observations (rows) and 100 features (cols). I would like to find:

  1. Linearly dependent features so that I can remove them and simplify the problem. rank(A) gives me 88, which I assume means that 12 of the features are linearly dependent. Am I right?
  2. After the above step, how do I determine which 12 out of the 100 columns are linearly dependent? I know there is no unique answer. But does that mean I can choose any 12 columns?
  3. Let's say I choose to remove the last 12 columns. But before removing them, I what to find the 12 linear combinations that compute to the last 12 columns. How do I get these?

So far I have tried using Matlab's PCA, QR and SVD, but each of them give different matrices and I don't know how to use these matrices to get what I want.

Best Answer

A little late but...

There's a measure called Pearson correlation that can be used to find linear correlation (dependence) between two variables X and Y. In short it is the covariance of the two variables divided by the product of their standard deviations:

Pearson's correlation formula

The result is a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.

Using it you can find which columns correlate and ignore (some of) them.

Related Question