Solved – How to aggregate/deal with time series data in multiple markets

multiple regressiontime series

I have our company's daily revenue data in multiple markets. The essence of the project is to see how different markets compare to each other based on a few variables, namely the age of the market and its population.

I've created 30 day rolling means for each market and plotted them,
enter image description here

What my boss wants is a way to input the population and the age and get out a predicted revenue number.

What would be the best way to go about doing this? Should I combine every markets data together and then create an ARIMA model using population as an external regressor? Would that work? Or should I create separate models for every market? Is ARIMA even what I want to do? Maybe just a linear regression with both variables and test for interaction?

I'm relatively new to professional statistics, having just graduated college in May, and any help would be greatly appreciated!

edit: the dataset looks like this:

mkt    days.since.launch   date    revenue    mkt.population

for each market there are a few hundred datapoints.

Best Answer

You should definitely not consider linear regression as you have time series data which has auto-correlative structure. Your post suggests a simplistic approach (30 day rolling means) which is an attempt to characterize your series. An ARIMA model is an optimization of "how many data points to use" and "how to weight them" . You are assuming an ARIMA model with 30 lags and each of the coefficients in this ARIMA model are identical (1/30). Now on to your problem. What you have is a Parent-To-Child problem ( items within a class ... markets within a business line ). This statistical problem also arises when you need to seamlessly integrate hourly forecasts and daily forecasts , sometimes called a mixed frequency problem . The way we have handled this problem is to create a composite series across markets and form an ARMAX Model ( that is ARIMA plus fixed variables ) incorporating variables like market population and other user-suggested support series. Then develop individual ARMAX models for each market and incorporate the total market sales as a predictor for each of the market models and any individual market predictors. Perform a reconciliation that is parent is boss or children are boss to reconcile your forecasts. Now what is very important is that when you form all of the ARMAX models you should be sensitive to Pulses, Level Shifts , Seasonal Pulses and/or Local Time Trends and of course any needed ARIMA structure. We have seen many applications around this kind of data. Unfortunately the rudimentary/simplistic tools/solutions that are usually available for free or often worth what you pay, thus you might need to acquire/develop this functionality. I am suggesting that "state-of-the-art solutions" are available in a number of commercially available software offerings. I have suggested elsewhere a reasonable approach to the idea of "picking the minds of software developers" to find out how you can do this yourself (or not ! ) How do I calculate projected figures for the next year based on past performance?

Related Question