Here is a simple recipe that may help you get started writing code and testing ideas...
Let's assume you have monthly data recorded over several years, so you have 36 values. Let's also assume that you only care about predicting one month (value) in advance.
- Exploratory data analysis: Apply some of the traditional time series analysis methods to estimate the lag dependence in the data (e.g. auto-correlation and partial auto-correlation plots, transformations, differencing).
Let's say that you find a given month's value is correlated with the past three month's data but not much so beyond that.
- Partition your data into training and validation sets: Take the first 24 points as your training values and the remaining points as the validation set.
- Create the neural network layout: You'll take the past three month's values as inputs and you want to predict the next month's value. So, you need a neural network with an input layer containing three nodes and an output layer containing one node. You should probably have a hidden layer with at least a couple of nodes. Unfortunately, picking the number of hidden layers, and their respective number of nodes, is not something for which there are clear guidelines. I'd start small, like 3:2:1.
- Create the training patterns: Each training pattern will be four values, with the first three corresponding to the input nodes and the last one defining what the correct value is for the output node. For example, if your training data are values $$x_1,x_2\dots,x_{24}$$ then $$pattern 1: x_1,x_2,x_3,x_4$$ $$pattern 2: x_2,x_3,x_4,x_5$$ $$\dots$$ $$pattern 21: x_{21},x_{22},x_{23},x_{24}$$
- Train the neural network on these patterns
- Test the network on the validation set (months 25-36): Here you will pass in the three values the neural network needs for the input layer and see what the output node gets set to. So, to see how well the trained neural network can predict month 32's value you'll pass in values for months 29, 30, and 31
This recipe is obviously high level and you may scratch your head at first when trying to map your context into different software libraries/programs. But, hopefully this sketches out the main point: you need to create training patterns that reasonably contain the correlation structure of the series you are trying to forecast. And whether you do the forecasting with a neural network or an ARIMA model, the exploratory work to determine what that structure is is often the most time consuming and difficult part.
In my experience, neural networks can provide great classification and forecasting functionality but setting them up can be time consuming. In the example above, you may find that 21 training patterns is not enough; different input data transformations lead to a better/worse forecasts; varying the number of hidden layers and hidden layer nodes greatly affects forecasts; etc.
I highly recommend looking at the neural_forecasting website, which contains tons of information on neural network forecasting competitions. The Motivations page is especially useful.
There's not nearly enough information to suggest a model, or to judge if a model might be too simple. With so little data, subject knowledge (how that particular kind of data tends to behave) becomes critical.
In an interview, you might take the strategy of suggesting several potential models - "If there's expected to be strong seasonality, and not a strong trend, maybe you could do this; if strong seasonality and strong trend seem likely, maybe do that; if seasonality and trend would be expected to be weak and noise high, perhaps do this ..." and so on.
(Though if it were me, I'd narrow it down finer than that.)
One might then say something like "if we really don't know what it is we're dealing with, and with little data, very simple models tend to forecast better than complex ones; perhaps exponential smoothing or double exponential smoothing might be one choice if we don't have more indication of what kind of model might be suitable."
(Added later in response to the request in comments)
As support for the claim that "very simple models tend to forecast better than complex ones" (particularly with little data), see for example, Makridakis and Hibon (2000) [1], discussing Makridakis and Hibon (1979) [2]:
The major conclusion of the Makridakis and Hibon study was that simple methods, such as exponential smoothing, outperformed sophisticated ones.
The statement was controversial in 1979, but results from the subsequent M-competitions broadly supported that conclusion (though the statements became somewhat more nuanced); similar sentiments can be found (for example) in the forecasting book by Makridakis, Wheelwright and Hyndman.
More broadly, see (for example) Green and Armstrong (2016) [3]:
Our review of studies comparing simple and complex methods — including those in this special issue — found 97 comparisons in 32 papers. None of the papers provide a balance of evidence that complexity improves forecast accuracy. Complexity increases forecast error by 27 percent on average in the 25 papers with quantitative comparisons. The finding is consistent with prior research to identify valid forecasting methods: all 22 previously identified evidence-based forecasting procedures are simple
[More recently, averages of forecasts have in many cases been found to perform quite well, but again those model-average forecasts have often tended to average over fairly simple models]
[1] Makridakis, S and Hibon, M (2000).
"The M-3 Competition: results, conclusions, and implications".
International Journal of Forecasting, 16 (October–December), 451-476
[2] Makridakis, S., & Hibon, M. (1979).
Accuracy of forecasting: an empirical investigation (with discussion).
Journal of the Royal Statistical Society A, 142, 97–145
[3] Kesten C. Green, K.C., and Armstrong, J. S. (2015),
Simple versus complex forecasting: The evidence
March 1, 2015
(forthcoming in Journal of Business Research)
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2643534
Best Answer
To answer your questions:
Yes - you can used neural networks, or some other generic ML method, to forecast sales. NNets have mixed results for time series, using other methods such as SVM or XGboost is not very common.
The are also other methods which are designed specifically for time series, such as ARIMA and Exponential Smoothing.
There is a package in R, the forecast package, which has many of these methods, including ARIMA, Exponential Smoothing, and some Neural Network models, and it is very easy to use.
.....However,
6 data points is not really enough to use any sophisticated forecasting approach to forecast 7~18 steps ahead, you're better off just using a naive forecast (use the last month of the data as your best guess), or maybe a naive seasonal forecast (use current January to forecast next January, current February to forecast next February, etc...) but even then you need at least 12 months of data, so in your case would have a gap between 7 and 12 and can only forecast from 13 to 18.
Since it is sales data, it is very likely seasonal - so basically, you don't have enough data to forecast anything other than a seasonal naive of month 13 ~ 18.