Regression vs Time Series – Is House Price Prediction a Regression or Time Series Problem?

pythonregressiontime series

I have a dataset of house prices from 2000 to 2016 from several U.S.A. cities. From this I want to predict the price of similar houses at the current date, is this better addressed as a regression problem or as a time series problem. An important point is that the data I have till now is rather sparse, just few thousand of records.

Best Answer

While the other answer is correct that the response variable can be modelled as a linear regression - you are dealing with house prices. As such, your dataset will likely suffer from what is called time series induced heteroscedasticity.

What this basically means is that since your houses will vary by age - i.e. some houses could be one year old, others over thirty years old, then you will have an unconstant variance across your residuals.

If you see this abstract titled "Heteroscedasticity in hedonic house price models", you will note that using Generalised Least Squares was indicated to remove the heteroscedasticity with forecast errors of a lower standard deviation that would be obtained through standard Ordinary Least Squares.

In summary to your question, your data can be modelled using regression analysis, but you do need to watch out for heteroscedasticity and also serial correlation. Moreover, you might find that your distribution (run a qqPlot to check) may not be normal, and your analysis might be better served through first converting your data to that of a normal distribution using a Box-Cox transformation.