Solved – Several questions about statistical financial timeseries models from “machine-learning person”

arimafinancemachine learning

In order to explain why I have those stupid question you'll find below I have to say that I am more a machine-learning person. While I worked on problems in bioinformatics everything was fine. When I heard words like "regression" or "kurtosis and skewness", in first case I just smiled, in second case I just made some clumsy move with my shoulders trying to say something like: "yes, I heard of it, and even know how to calculate it, but why on Earth anyone will need it?".

Situation dramatically changed when year ago, just for fun, I tried to apply my machine-learning knowledge to some financial time series.

I started with idea of making a Bayesian Network from signals provided by "technical" "analysis" "indicators". Idea failed. Also it was somewhat pleasing to find at least two topics with similar idea on this site (which used neural networks instead of bayesian).

Next, after a lot of effort, I was able to build a mixture of kNN and symbolic regression which I trained on 1-hour data from 2000 to 2006 and tested on data from 2007. This model actualy gave a grate profit. But then when I applied it to latest data I realized that it's accuracy dramatically droped because of economic crysis and it doesn't work anymore because something changed in the market and I need more new data, which I can obtain only in a 2-5 years.

Well, lot of stuff was tried later and if this whole thing started as "just for fun", it was not fun anymore. Untill I found online lectures of Ruey S. Tsay on ARIMA, GARCH, TAR and all other completely new stuff to me.

Basically I found a whole new world and I really enjoy it. Right now I was able to fit my first ARIMA model, and then tuned it to reduce rms-error twice by looking at ACF, PACF, playing with seasonality, and so on.

Well, the fun is back, I had a lot of it and I expect having even more. But I had some questions and found this great site. Read almost all topics about ARIMA and other related techniques here, along with many other general topics related to similar approaches. For sure will be reading more. I am still thinking in a mind frame of machine learning approach, which leads to a lot of stupid questions, for most of which I find answers on this site.

So, after this long introduction, here are my remaining stupid questions:

  1. While machine-learning approach is more concerned about finding "pattern" in data I find it in contradiction with statistical models for financial timeseries which extensively use random walk theory (which makes existance of patterns at least questionable). I realize that its very naive and incorrect description, but what I am trying to say, is that most machine learning techniques are in conceptual contradiction with statistical approach to the problem. I am not saying that some approach is better, I am just saying that they have contradictions. Is that correct and how big is this contradiction?

  2. I really liked description and idea of TAR model which for me looks like marriage of machine-learning with statistics. This is a model I want to try next after I add GARCH to my ARIMA. But I have some questions about it:

    • TAR definitely uses both statistical and machine learning approaches. So, keeping in mind my first question, aren't there an error when triyng to find pattern for set of models, which are basically built on theory which excludes patterns? Or it is just an idea of how to combine two models which study different aspects of same problem, into one even more powerfull model?

    • When you make search by keyword "ARIMA" on this site, you'll hit 15 pages of topics, while for TAR there is only one. Also why do people stopped just on applying AR? Why not expand this idea for more complex models (like ARIMA) ? Is it because TAR didn't give expected improvement over AR ?

  3. I know that MCMC methods and other machine-learning stuff is currently being mixed with statistical models. I am personally also a big fan of Hidden Markov Models and Conditional Random Fields. Do you know about any mixtures of any of these methods with statistical models ?

Best Answer

Regarding question 1, time series do not deal mainly with random walks. Stationary time series have correlation structure that is modelled in for example ARMA models. Time series analysis also looks at periodic effects and trend (we call those time series nonstationary). Looking for patterns in data is not incompatible with statistics as long as there is recognition that there is a pattern + a random component and the random component must be considered in the analysis. Regarding question 2 I don't see why you call TAR a mix of machine learning and statistics. I see it as just a more complicated time series model that includes a threshold parameter an 2 AR models. I guess I also don't see a big distinction between machine learning and statistics. I view machine learning as part of statistical pattern recognition/classification which falls under the realm of multivariate analysis. It seems to me that TAR could easily be extended to putting a threshold on an ARMA model. I don't know if it has been tried or why it might not have been developed. Perhaps someone who works with these type of time series models can answer that question.

Related Question