Solved – Can we use cluster analysis in multiple regression

clusteringregression

I am quite new to Data Analytics. I was just wondering whether we can use cluster analysis in Multiple Regression. Let me give you a scenario so that it becomes easier to visualize.

I have a dataset of Property Transactions in the year 2013. The dataset has Property price, Region, Property Area in sq.m, Properties' locational attributes like is it close to bus stop, super market and so on.

Now if I use multiple regression model over here, I can use Price as my dependent variable and other variables as my independent variables and figure out what independent variables cause major influence on Price.

Instead of this approach if I use cluster analysis and figure out which regions have maximum prices and what are the locational attributes that are causing this increase, divide the dataset based on these clusters and then do multiple regression on these datasets to see what regression analysis results I get, will it make sense?

Best Answer

Since you have labeled data, a supervised approach will usually outdo any unsupervised approach.

I agree with Peter Flom, who in the comments noted that you "misunderstand the purpose of cluster analysis". Clusters are not meant to find regions with maximum prices.

Chances are that by partitioning your data into clusters without paying attention to price, your multiple regeression approach will be worse, because it only sees part of the data; and it may have discontinuities at the borders. But in other cases, exactly this can help.

Why not just give it a try and see for yourself? But beware of overfitting, don't increase the number of variables too much.

Related Question