Solved – What are the state-of-the-art methods to determine parameters in CNN, NN, RNN, or any deep learning models

conv-neural-networkdeep learninghyperparametermachine learningneural networks

The question is "how do we determine the (hyper)parameters in deep learning models, usch as CNN, RNN?"

This is a difficult question that so far I am not aware of a solid solution and I want to bring this up in a more specific manner.

(Similar question has been asked before 2 years ago, and I think it is a good time to bring it up again: Guideline to select the hyperparameters in Deep Learning)

The list of parameters/hyperparameters are ,for example:

  1. Input patch size (e.g., 64×64)
  2. number of layers
  3. number of filters in each convolutional layer (if it is a CNN)
  4. learning rate of the network

Here are some suggestions based on different papers and I wrote my concerns:

  1. Typically, people suggest that the deeper the network the better, but then how deep is that? Because we cannot just say, 10 is deep enough or 100 is deep enough, especially in the field of biological science research (When you publish a paper; reviewers will judge you)
  2. Based on No.1, people will suggest to use (a) grid search; (b) random search to find the best combinations of parameters. But the problem is, running all these kind of combinations (let say 200) take A LOT OF TIMES. It is a feasible.
  3. Based on No.1 and No.2, researchers will suggest to build the model based on some successful models. For example, people nowadays will start with the parameters according to 2012 Imagenet. However, this may not be applied to all the cases. For example, in MRI images analysis, people do pixel classification for segmentation, in that case having 256×256 input patch is just not make sense (since you are predicting the class of the center of the patch).

Best Answer

Typically, people suggest that the deeper the network the better

At some point, adding more layers hurt performances.

Example 1: Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "FractalNet: Ultra-Deep Neural Networks without Residuals." arXiv preprint arXiv:1605.07648 (2016). https://arxiv.org/abs/1605.07648

enter image description here

Example 2: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition". arXiv:1512.03385. https://arxiv.org/abs/1512.03385

enter image description here

Practitioners simply try and see (and in between buy more GPUs).

(Similar question has been asked before 2 years ago, and I think it is a good time to bring it up again: Guideline to select the hyperparameters in Deep Learning)

As far as I know, not much progress has been done since then. There have been some new papers, but using known hyperparameter optimization techniques, e.g.:

There is one original work though that one of my lab mate made me aware of:

Miconi, Thomas. "Neural networks with differentiable structure.", arXiv (2016): the author introduce the concept of differentiable network structure to include it in the objective function.

Related Question