Solved – RNNs for Non-sequential Data

machine learningneural networksrecurrent neural network

From what I know, Recurrent NNs perform very well in case of sequential data. However, I have also read at many places that it can be used for non-sequential data as well. For instance in the article 'The unreasonable effectiveness of Recurrent Neural Networks' by Andrej Karpathy:

Sequential processing in absence of sequences
You might be thinking that having sequences as inputs or outputs could be relatively rare, but an important point to realize is that even if your inputs/outputs are fixed vectors, it is still possible to use this powerful formalism to process them in a sequential manner.

From what I can make out, we model our data in a sequential form. So, if the input is a data point, the output would be the data point after t time steps, and we train our model on that data.

I came across a Kaggle problem and was wondering if RNNs can be applied to it. The problem is to classify credit card frauds. The only thing which can be modelled as a sequence, I believe, is the class of the output but then wouldn't the model be just learning the sequence of zeros and ones based on the other features? Because there is nothing which identifies a previous datapoint and can be used to learn a sequence from. So, does that mean RNNs cannot be applied to this problem because it lacks that ability to be converted into sequential form, or maybe I am missing something?

Best Answer

1) Talking about output, fraud detection is a subclass of anomaly detection, so classification is a correct, but not the most appropriate view. In this case we basically learn not just zeros and ones, but the whole distribution that lies under input data-points. It gets converted to a classification problem only after applying some threshold to the output probability in order to catch some rare event, so it could be used to either analyze anomaly or apply some supervised learning algorithm.

2) The representation of the output doesn't really matter so much here. Imagine classifier that takes a word in a text and outputs if it represents an animal:

hello => 0
kitty => 1
cartoon => 0
minny => 0

It could be represented as a RNN that takes the whole text as a stream:

hello kitty cartoon minny => 0 1 0 0

And in this case it could use precedence of word "hello" as a clue that "kitty" is going to be next with higher probability

3) Talking about input and RNNs, there is actually a correspondence between RNNs and CNNs (in image processing) as CNNs convolution/cross-correlation can be viewed as a particular case of RNN. This is actually a most popular way of applying (sometimes) RNNs to a non-sequential data

4) returning back to credit card case, if every input point is not actually i.i.d (identically and independently distributed) like in "hello kitty" case, then RNN could help by identifying this connection:

ok, attacker1.attack1, ok, ok, attacker1.attack2, ok, attacker2.attack

RNN could identify connection (correlation) between attacker1.attack1 and attacker1.attack2 (for instance credit card number brute-force) faster than regular network as those attacks, regardless that they could come from different IPs - would still be close in time and have something in common (let's say some digits could repeat or have some underlying generating function). Without RNN, feedforward network would have to use parameters from only one point

TLDR; RNN (especially Long-Term-Short-Term) allow you to specify a prior belief about relationships between events in "time", so you could benefit from it if those events are actually coupled (grouped) in some way even if they're not strictly sequential. Basically RNN provides a powerful way to apply linear/non-linear cross-"correlation" (much like CNN, but more general).

Related Question