Solved – RNNs for Non-sequential Data

machine learningneural networksrecurrent neural network

From what I know, Recurrent NNs perform very well in case of sequential data. However, I have also read at many places that it can be used for non-sequential data as well. For instance in the article 'The unreasonable effectiveness of Recurrent Neural Networks' by Andrej Karpathy:

Sequential processing in absence of sequences
You might be thinking that having sequences as inputs or outputs could be relatively rare, but an important point to realize is that even if your inputs/outputs are fixed vectors, it is still possible to use this powerful formalism to process them in a sequential manner.

From what I can make out, we model our data in a sequential form. So, if the input is a data point, the output would be the data point after t time steps, and we train our model on that data.

I came across a Kaggle problem and was wondering if RNNs can be applied to it. The problem is to classify credit card frauds. The only thing which can be modelled as a sequence, I believe, is the class of the output but then wouldn't the model be just learning the sequence of zeros and ones based on the other features? Because there is nothing which identifies a previous datapoint and can be used to learn a sequence from. So, does that mean RNNs cannot be applied to this problem because it lacks that ability to be converted into sequential form, or maybe I am missing something?

Best Answer

1) Talking about output, fraud detection is a subclass of anomaly detection, so classification is a correct, but not the most appropriate view. In this case we basically learn not just zeros and ones, but the whole distribution that lies under input data-points. It gets converted to a classification problem only after applying some threshold to the output probability in order to catch some rare event, so it could be used to either analyze anomaly or apply some supervised learning algorithm.

2) The representation of the output doesn't really matter so much here. Imagine classifier that takes a word in a text and outputs if it represents an animal:

hello => 0
kitty => 1
cartoon => 0
minny => 0

It could be represented as a RNN that takes the whole text as a stream:

hello kitty cartoon minny => 0 1 0 0

And in this case it could use precedence of word "hello" as a clue that "kitty" is going to be next with higher probability

3) Talking about input and RNNs, there is actually a correspondence between RNNs and CNNs (in image processing) as CNNs convolution/cross-correlation can be viewed as a particular case of RNN. This is actually a most popular way of applying (sometimes) RNNs to a non-sequential data

4) returning back to credit card case, if every input point is not actually i.i.d (identically and independently distributed) like in "hello kitty" case, then RNN could help by identifying this connection:

ok, attacker1.attack1, ok, ok, attacker1.attack2, ok, attacker2.attack

RNN could identify connection (correlation) between attacker1.attack1 and attacker1.attack2 (for instance credit card number brute-force) faster than regular network as those attacks, regardless that they could come from different IPs - would still be close in time and have something in common (let's say some digits could repeat or have some underlying generating function). Without RNN, feedforward network would have to use parameters from only one point

TLDR; RNN (especially Long-Term-Short-Term) allow you to specify a prior belief about relationships between events in "time", so you could benefit from it if those events are actually coupled (grouped) in some way even if they're not strictly sequential. Basically RNN provides a powerful way to apply linear/non-linear cross-"correlation" (much like CNN, but more general).

Related Solutions

Solved – What does alignment between input and output mean for recurrent neural network

What the paragraph from Sutskever and Sutskever means is, in a single RNN, the RNN receives a sequence of inputs, and gives a sequence of outputs , typically the same number of outputs as inputs, like:

Inputs:  i1 i2 i3 i4 i5 i6 i7
Outputs:    o1 o2 o3 o4 o5 o6 o7

So, for each input, we get an output, which could be for example a prediction for the input at the next timestep, though that's not obligatory, just a typical use-case.

Now, this works for tasks such as predicting the next word of a sentence, like i1 is the first word, and i2 is the second. o1 is a prediction for i2, and o2 is a prediction for i3. There is a one-to-one mapping of inputs and outputs.

However, eg to translate from French to English, the number of input words and output words might not match:

Il pleut
It is    raining

2 words => 3 words

Sequence to sequence solves this by having two RNNs, back to back. The first one takes an arbitrary sequence, and maps it to a single embedding vector, being simply the hidden state of the RNN, after receiving all the input words.

i1 i2 i3 i4 i5 i6 i7 ... => embedding-vector

Then the second RNN is initialized with this embedding-vector, and then predicts words freely, with no further inputs, until it outputs a termination token.

embedding-vector => o1 o2 o3 ... termination-token

Putting these together, we pump in a sequence of one length, and the output can be a different length, eg:

i1 i2 i3 => embedding-vector => o1 o2 o3 o4 o5 termination-token

Solved – RNN for irregular time intervals

I just wrote a blog post on that topic!

In short, I write about different methods for dealing with the problem of sparse / irregular sequential data.

Here is a short outline of methods to try:

Lomb-Scargle Periodogram
This is a way of computing spectrograms on non-equidistant timestep series.
Data modeling with Interpolation networks
You really don't want to interpolate naively between timesteps, but training a network to interpolate for you might help!
Neural Ordinary Differential Equation models
Neural networks that can work with continuous time can naturally work on irregular time series.
Add timing dt to the input as an additional feature (or positional encoding in Tensorflow)
Methods for dealing with missing values
This is only viable if you have vast amounts of data

Hope this helps point you to the right direction :)

Best Answer

Related Solutions

Solved – What does alignment between input and output mean for recurrent neural network

Solved – RNN for irregular time intervals

Related Question