Solved – How to determine which columns best correlate with target column in a SQL Server Table

correlationexcelsql

I have a table with a couple hundred columns and a few hundred thousand rows and want to determine which combination of 5-10 columns best correlate with a target column.

I have SQL Server and Excel but no budget beyond that.

My current plan is:

  1. Pull the table into Excel
  2. Use the correl() function on each column
  3. Combine subsets of the best correlated columns to see if their combined correlation coefficient improves. (i.e. new ab merge column =A1*1000+B1)

Is there a better way to do this?

Best Answer

You probably want to use R and perform linear regression -- it will give you the linear combination of columns that correlates best with the target column (dependent variable).

Related Question