Solved – Multiple neural networks with single output neuron vs. single NN with multiple output neurons

caretneural networksr

Main Question

Given multiple output parameters that are independent of each other, would multiple ANNs with a single output neuron give better prediction results than a single ANN with multiple outputs? Is there a benefit for each case?

Specific description:

I am using the 'caret' package in R to come up with an optimized artificial neural network (ANN). The method I am using to train the network is within 'nnet' package, which by itself allows multiple output neurons:

For instance, for the given dataset:

dataset
x1    x2    x3    y1    y2    y3
1.4    5    6.1   7.9   8.5   3.5
...   ...   ...   ...   ...   ...

I use

nnet(dataset[,InputIndices],dataset[,OutputIndices],size=HN, decay=wd, rang=rg)

But when using the 'nnet' method within 'caret', only single output is possible.

Within Caret documentation:

Arguments

x: an object where samples are in rows and features are in columns. This could be a simple matrix, data frame or other type (e.g. sparse matrix).

y: a numeric or factor vector containing the outcome for each sample.

form: A formula of the form y ~ x1 + x2 + …

Therefore the code I use to train the network is:

for (n in 1:NumOutputNeurons) {
    train(traindata[,InputIndices],traindata[,OutputIndices][,n], tuneGrid=param.grid, 
        maxit = 1e4, 
        method = "nnet", linout=F, trace=F, na.rm = TRUE,
        trControl = tc)
}

From a statistical point of view, is each method better than the other?

(As an extra question, do you know a way to allow multiple outputs within Caret package?)

Best Answer

The approach of using one network predicting multiple variables is called Multitask learning and it is a great way to improve the performance of a network. See Rich Caruana, 1997: Multitask learning for the details.

In short, it is conjectured that while learning every "task" (prediction of a single variable), the network learns some of the features of the input space which are helpful for that task. Each task will learn different features, but they can be also helpful for other tasks. Learning multiple tasks at once allows to discover richer sets of features which help improve the overall performance. Also, it is a great way to prevent overfitting.

Related Solutions

Solved – Help requested with using custom model in caret() package

getModelInfo shows you the code for built-in models. grnn is not wrapped by this package, so you won't find code there.

There are a lot of avoidable problems. First, you have your data mixed up:

x <- rep(1:100); y <- x^2+x*rnorm(100,0,1); tr <- data.frame(y=y,x=x)

tr[,-1] is x so y=tr[,-1] is wrong.

For your code, there are a few things:

the grid module should be a function instead of a data frame. That is where the attempt to apply non-function comes from. However:
the arguments to the pred and fit modules do not include most of the required arguments listed on the help page.

For this particular package:

You might have to do something like this:

grnnFit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
                    dat <- x
                    dat$.outcome <- y
                        smooth(learn(dat, variable.column = ncol(dat)), 
                               sigma = param$sigma)}

Also, for this package, you might have to use guess inside of apply.

My impression is that you should slow down and read the documentation (it really looks like you did not). There are some weird things about grnn (to me) and it has almost no documentation. That should be the hard part, so read the caret web page and get the easy parts down.

Max

** Update** As Max alluded to, grnn() guess() method can only compute a prediction for a single vector so this had to be wrapped in a for loop.

The new working code:

#Using caret() to determine the optimum value for grnn() smooth parameter    
grnnFit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
  #use argument names EXACTLY as here in all functions
  library(grnn)
  dat <- data.frame(y, x)
  s <- smooth(learn(dat), sigma=param$sigma)
  return(s)
}

grnnPred <- function(modelFit, newdata, preProc=NULL, submodels=NULL) {
  library(grnn)
  library(foreach)
  xlst <- split(newdata, 1:nrow(newdata))
  pred <- foreach(i = xlst, .combine = rbind) %do% {
    #grnn() can only compute a prediction for one sample at a time
    guess(modelFit, as.matrix(i)) #provide x values as matrix
  }
}

grnnSort <- function(x) {
  x[order(x$sigma),]
  print(x)
}

grnnGrid <- function(x, y, len=NULL) {
  #only one tuning parameter sigma
  data.frame(sigma=seq(1,4,.05)) #search range
}

grnnLev <- function(x) {
  lev(x)
}

#list of params/functions
lpgrnn <- list(
  library="grnn",
  type="Regression",
  parameters=data.frame(parameter="sigma", class="numeric", label="Sigma"),
  grid=grnnGrid,
  fit=grnnFit,
  predict=grnnPred,
  prob=NULL,
  levels=grnnLev,
  sort=grnnSort)

library(caret)
set.seed(123)
x1 <- rep(1:100) + rnorm(100,0,1)
x2 <- rep(1:100) + rnorm(100,0,1)
tr <- data.frame(y=x1*x2, x1, x2)
set.seed(998)
fitControl <- trainControl(method="repeatedcv", repeats=5)
set.seed(825)
res <- train(y~., data=tr, method=lpgrnn, metric="RMSE", trControl = fitControl)
print(res)
print(res$finalModel$sigma)
plot(res)

sigma versus RMSE

Solved – Training nnet and avNNet models with caret when the output has negatives

Is the question related to the scale in which nnet predicts? Since Y is roughly between -1 and 1 you should also use linout = FALSE in your nnet and train calls.