Solved – Multiple neural networks with single output neuron vs. single NN with multiple output neurons

caretneural networksr

Main Question

Given multiple output parameters that are independent of each other, would multiple ANNs with a single output neuron give better prediction results than a single ANN with multiple outputs? Is there a benefit for each case?

Specific description:

I am using the 'caret' package in R to come up with an optimized artificial neural network (ANN). The method I am using to train the network is within 'nnet' package, which by itself allows multiple output neurons:

For instance, for the given dataset:

dataset
x1    x2    x3    y1    y2    y3
1.4    5    6.1   7.9   8.5   3.5
...   ...   ...   ...   ...   ...

I use

nnet(dataset[,InputIndices],dataset[,OutputIndices],size=HN, decay=wd, rang=rg) 

But when using the 'nnet' method within 'caret', only single output is possible.

Within Caret documentation:

Arguments

x: an object where samples are in rows and features are in columns. This could be a simple matrix, data frame or other type (e.g. sparse matrix).

y: a numeric or factor vector containing the outcome for each sample.

form: A formula of the form y ~ x1 + x2 + …

Therefore the code I use to train the network is:

for (n in 1:NumOutputNeurons) {
    train(traindata[,InputIndices],traindata[,OutputIndices][,n], tuneGrid=param.grid, 
        maxit = 1e4, 
        method = "nnet", linout=F, trace=F, na.rm = TRUE,
        trControl = tc)
}

From a statistical point of view, is each method better than the other?

(As an extra question, do you know a way to allow multiple outputs within Caret package?)

Best Answer

The approach of using one network predicting multiple variables is called Multitask learning and it is a great way to improve the performance of a network. See Rich Caruana, 1997: Multitask learning for the details.

In short, it is conjectured that while learning every "task" (prediction of a single variable), the network learns some of the features of the input space which are helpful for that task. Each task will learn different features, but they can be also helpful for other tasks. Learning multiple tasks at once allows to discover richer sets of features which help improve the overall performance. Also, it is a great way to prevent overfitting.