Hi all,
I’m fairly new to ANN and I have a question regarding the use of k-fold cross-validation in the search of the optimal number of neurons.
I already read the very useful ANN FAQ (relevant extract: http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html ) and looked in the archive of this site, but I would like to make sure the following procedure is right! My questions are at the bottom of the post…
Problem description: mapping problem (input->output, all doubles) using MLP and the LM backprop algorithm (default) with one hidden layer.
1. Load data: 2 input vectors (“input1” and “input2”) and 1 output vector (“output1”), all containing 600 values;
2. Divide the data set in training and “testing” set for the cross-validation:
k = 10;cv = cvpartition(length(input1),'kfold',k);for i=1:k trainIdxs{i} = find(training(cv,i)); testIdxs{i} = find(test(cv,i)); trainMatrix{i} = [input1(trainIdxs{i}) input2(trainIdxs{i}) output(trainIdxs{i})]; validMatrix{i} = [input1(testIdxs{i}) input2(testIdxs{i}) output(testIdxs{i})]; end
3. I create a feedforwardnet with hidden nodes and change some training parameters to prevent early stopping (values are chosen which cannot be achieved in this case). There’s probably a more elegant way to do this, but I didn’t find it quickly in the documentation.
net = feedforwardnet(nr_hidden_nodes); net.divideFcn = ''; net.trainParam.epochs = 30; net.trainParam.max_fail = 500; net.trainParam.min_grad = 0.000000000000001;
4. The network is trained, and the performance of the traning and testing sets is calculated (MSE) inside a loop over the number of folds:
for i=1:k [net,tr] = train(net,trainMatrix{i}(:,1:2)',trainMatrix{i}(:,3)'); % Removed the simple code to calculate the MSE.
end
5. Operations 3 and 4 are repeated for several values of nr_hidden_nodes. The division in training and testing data remains exactly the same.
Now I have the following questions:
a. Should I re-initialize the network “net” after each training? And how can this be done in MATLAB? The command “init” sets it to random weights…
b. Is the above-described procedure correct in general? I intend to use the nr_hidden_nodes corresponding to the minimum of (MSE(training set) + MSE(testing set)).
c. Since the data is divided randomly in a training and testing set (cf. “cvpartition”), should I repeat the experiment multiple times?
Thanks in advance!
Best Answer