Solved – Verifying neural network model performance

machine learningmodel-evaluationneural networksvalidation

I'm doing some experiments with neural networks and I wanted to ask for some support to verify my methodology and my results.

My setup: I have separated my data into time slots of 5 seconds, i.e. all timestamps are within 5 seconds. For each slot I get approx. 1000 samples. Furthermore, I have 7 features for each sample which are mean normalized. 4 are numerical and 3 are boolean (0,1). My network topology is 7-15-1, training algorithm is resilient backpropagation (Jordan recurrent network) with a sigmoid activation function and atan error function. I use 5-fold cross validation to check the network. The target feature of each sample is a boolean value.

My goal is to train the network with the data of one time slot (time slot n) and use it to rank the samples from the next (n+1) time slot (on a scale from 0-1, thus the single output node). I'm using neural networks as I want to compare its performance with other models like SVM and logistic regression.

Here are my questions regarding the setup and the results:

  1. I'm able to collect all samples with all features within the time slot. Do I need to check for the probability distribution of the features that I'm using or is the mean normalization sufficient?
  2. Is it reasonable to use the results of a single output node as a ranking function?
  3. The training error quickly drops below 0.1 (after 20-50 iterations). With more iterations (up to 10.000) it doesn't get better, often it gets worse (error up to 0.4). Is it ok that the error rates drops so quickly or should I be sceptical?
  4. Are neural networks in general only really usable after, e.g. 50.000 iterations?
  5. Is it reasonable to accept an error rate of 0.1? Of course, the less the better, but I'm not able to further minimize the error.
  6. The cross validation error is always approx. twice the training error. Is this too much? I read "when the cv error is way larger than the training error, you're suffering from high variance" Is 0.2 already large or is 0.4 large?
  7. As I have a target value for each sample, I also do a f-measure evaluation for each classifier. It's always approx. 0.75. I chose 0.3 as classification error threshold, i.e. when the difference between ideal value (0 or 1) and predicted value are within 0.3 then the classification is ok. Is this value too large or would you say that one should always go with an classification threshold >0.95?

Best Answer

Just to make sure we are on the same page: You have a sequence of 1000 samples with 7 features each. There is a sequential pattern in there, which is why you process them with an RNN. At each timestep:

  1. It depends. It might get better if you use different normalizations, hard to tell.
  2. To me it just sounds like classification. I am not sure what you mean by ranking exactly.
  3. No reason to be skeptical. Normally, training error drops like that--extremly quick for few iterations, very slow afterwards.
  4. No, absolutely not. For some tasks, less than 100 iterations (= passes over the training set) suffice.
  5. You are the one who has to say whether the error is small enough. :) We can't tell you without knowing what you are using the network for.
  6. Hard to tell. You should use early stopping instead. Train the network until the error on some held out validation set rises--that's the moment from which on you only overfit. Use the weights found then to evaluate on a test set. (That makes it three sets: training, validation, test set).

Here are some tips that I can give: