Solved – Reference to learn how to interpret learning curves of deep convolutional neural networks

conv-neural-networkdeep learningmachine learningneural networks

Question

What are good reference articles/blogs/tutorials to learn how to intepret learning curves for deep convolutional neural networks?

Background
I am trying to apply convolutional neural networks (CNN) for vessel segmentation (specifically to determine whether or not the center pixel of an image patch is on a vessel) using caffe.

I have about 225000 training images (~50% positive) and 225000 (~50% positive) testing/validation images.

My input images are of size 65 x 65. I have four convolutional layers (48x6x6, 48x5x5, 48x4x4, 48x2x2) each followed by 2×2 max-pooling layers , one fully connected layer of 50 neurons, and a final scoring layer with 2 neurons. My training batch size is 256 and my testing batch size is 100.

I am using a stochastic gradient descent optimizer (SGD) and an inverse decay learning rate policy. Below are my caffe solver parameters:

  • type: "SGD"
  • base_lr: 0.01
  • lr_policy: "inv"
  • gamma: 0.1
  • power: 0.75
  • momentum: 0.9
  • weight_decay: 0.0005

Below is the learning curve i am getting:

enter image description here

I am using the cross entropy classification loss or multinomial logistic loss (see here).

I would like to hear how people interpret this learning curve and what parameters they would change to try and improve the test accuracy.

  • The training loss is decreasing but decreasing very slowly. Could it mean that my learning rate is low?

  • On the contrary i see that the test loss decreases quickly at first and then slows down. Could this mean that my learning rate was high and it got stuck in the local minimum?

  • Also the test accuracy has stabilized and stopped increasing too soon. Could this mean that i have to try and increase my model capacity or decrease my regularization?

In general, what would help me and also probably others is if someone could point out a reference article/book/blog-post that delves deeply into the interpretation of such learning curves with many example cases.

I found this blog post which was very helpful but there is not much about the interpretation of learning curves (atleast not to my satisfaction).

Best Answer

2 things:

  1. You should probably switch your 50/50 train/validation repartition to something like 80% training and 20% validation. In most cases it will improve the classifier performance overall (more training data = better performance)
  2. If you have never heard about "early-stopping" you should look it up, it's an important concept in the neural network domain : https://en.wikipedia.org/wiki/Early_stopping . To summarize, the idea behind early-stopping is to stop the training once the validation loss starts plateauing. Indeed, when this happens it almost always mean you are starting to overfit your classifier. The training loss value in itself is not something you should trust, because it will continue to decrease even when you are overfitting your classifier.

I hope I was clear enough, good luck in your work :)