Chaos Theory – Practical Applications in Data Mining

data miningfractalmathematical-statisticsreferencesself-study

While casually reading some mass market works on chaos theory over the last few years I began to wonder how various aspects of it could be applied to data mining and related fields, like neural nets, pattern recognition, uncertainty management, etc. To date, I've run into so few examples of such applications in the published research that I wonder if a) they've actually been put into practice in known, published experiments and projects and b) if not, why are they used so little in these interrelated fields?

Most of the discussions of chaos theory I've seen to date revolve around scientific applications that are entirely useful, but have little to do with data mining and related fields like pattern recognition; one of the archetypical examples is the Three-Body Problem from physics. I want to forego discussion of ordinary scientific applications of this kind and restrict the question solely to those applications which are obviously relevant to data mining and related fields, which seem to be few and far between in the literature. The list of potential applications below can be used as a starting point of a search for published research, but I'm only interested in those applications that have actually been put into practice, if any. What I'm looking for are known implementations of chaos theory to data mining, in contradistinction to the list of potential applications, which is much broader. Here's a small sampling of off-the-cuff ideas for data mining applications that occurred to me while reading; perhaps none of them are pragmatic, perhaps some are being put to practical use as we speak, but go by terms that I'm not yet familiar with:

  1. Identifying self-similar structures in pattern recognition, as Mandelbrot did in a practical way in the case of error bursts in analog telephone lines a few decades ago.
  2. Encountering Feigenbaum's Constant in mining results (perhaps in a manner similar to how string theorists were startled to see Maxwell's Equations pop up in unexpected places in the course of their research).
  3. Identifying the optimal bit depth for neural net weights and various mining tests. I wondered about this one because of the vanishingly small numerical scales at which sensitivity to initial conditions comes into play, which are partially responsible for the unpredictability of chaos-related functions.
  4. Using the notion of fractional dimensions in other ways not necessarily related to fascinating fractal curiosities, like Menger Sponges, Koch Curves or Sierpinski Carpets are. Perhaps the concept can be applied to the dimensions of mining models in some beneficial way, by treating them as fractional?
  5. Deriving power laws like the ones that come into play in fractals.
  6. Since the functions encountered in fractals are nonlinear, I wonder if there's some practical application to nonlinear regression.
  7. Chaos theory has some tangential (and sometimes overstated) relationships to entropy, so I wonder if there's some way to calculate Shannon's Entropy (or limits upon it and its relatives) from the functions used in chaos theory, or vice versa.
  8. Identifying period-doubling behavior in data.
  9. Identifying the optimal structure for a neural net by intelligently selecting ones that are most likely to "self-organize" in a useful way.
  10. Chaos and fractals etc. are also tangentially related to computational complexity, so I wonder if complexity could be used to identify chaotic structures, or vice-versa.
  11. I first heard of the Lyapunov exponent in terms of chaos theory and have noticed it a few times since then in recipes for specific neural nets and discussions of entropy.

There are probably dozens of other relationships I haven't listed here; all of this came off the top of my head. I'm not narrowly interested in specific answers to these particular speculations, but am just throwing them out there as examples of the type of applications that might exist in the wild. I'd like to see replies that have examples of current research and existing implementations of ideas like this, as long the applications are specifically applicable to data mining.

There are probably other extant implementations I’m not aware of, even in areas I'm more familiar with (like information theory, fuzzy sets and neural nets) and others I those I have even less competence in, like regression, so more input is welcome. My practical purpose here is to determine whether or not to invest more in learning about particular aspects of chaos theory, which I'll put on the back burner if I can't find some obvious utility.

I did a search of CrossValidated but didn't see any topics that directly address the utilitarian applications of chaos theory to data mining etc. The closest I could come was the thread Chaos theory, equation-free modeling and non-parametric statistics, which deals with a specific subset.

Best Answer

Data mining (DM) as a practical approach appears to be almost complementary to mathematical modeling (MM) approaches, and even contradictory to a chaos theory (CT). I'll first talk about DM and general MM, then focus on CT.

Mathematical modeling

In economic modeling DM until very recently was considered almost a taboo, a hack to fish for correlations instead of learning about causation and relationships, see this post in SAS blog. The attitude is changing, but there are many pitfalls related to spurious relationships, data dredging, p-hacking etc.

In some cases, DM appears to be a legitimate approach even in fields with established MM practices. For instance, DM can be used to search for particle interactions in physical experiments that generate a lot of data, think of particle smashers. In this case physicists may have an idea how the particles look like, and search for the patterns in the datasets.

Chaos Theory

Chaotic system are probably particularly resistant to analysis with DM techniques. Consider a familiar linear congruental method (LCG) used in common psudo-random number generators. It is essentially a chaotic system. That is why it's used to "fake" random numbers. A good generator will be indistinguishable from a random number sequence. This means that you will not be able to determine whether it's random or not by using statistical methods. I'll include data mining here too. Try to find a pattern in the RAND() generated sequence with data mining! Yet, again it's a completely deterministic sequence as you know, and its equations are also extremely simple.

Chaos theory is not about randomly looking for similarity patterns. Chaos theory involves learning about processes and dynamic relationships such that small disturbances amplify in the system creating unstable behaviors, while somehow in this chaos the stable patterns emerge. All this cool stuff happens due to properties of equations themselves. The researchers then study these equations and their systems. This is very different from the mind set of applied data mining.

For instance, you can talk about self-similarity patterns while studying chaotic systems, and notice that data miners talk about search for patterns too. However, these handles "pattern" concept very differently. Chaotic system would be generating these patterns from the equations. They may try to come up with their set of equations by observing actual systems etc., but they always deal with equations at some point. Data miners would come from the other side, and not knowing or guessing much about the internal structure of the system, would try to look for patterns. I don't think that these two groups ever look at the same actual systems or data sets.

Another example is the simplest logistic map that Feigenbaum worked with to create his famous period doubling bifurcation.

enter image description here

The equation is ridiculously simple: $$x_{n+1} = r x_n (1 - x_n)$$ Yet, I don't see how would one discover it with data mining techniques.