Solved – Combine several softmax output probabilities

machine learningneural networksprobabilitytime series

I would like to combine the outputs of five neural networks, each with a softmax output layer of three classes each. A typical, example output is shown below:-
enter image description here
where Figure 1 is the output of model 1, Figure 2 of model 2 etc. and the y-axis shows the output values and the x-axis is the time stamp of the (financial) time series. The blue line represents a "buy," the red line is a "sell" and the yellow line is a "do nothing." Since the sum of the outputs of the softmax layers sum to one, the value for each "signal" can be considered to be a probability, and so my question is: how can I combine these five separate, probabilistic outputs into one "global" probability output?

This is perhaps not straight forward as the five outputs are not independent because

  1. Each model takes as its input very similar features calculated from the same moving window on the underlying time series, although in the NN training the model assumptions and targets differ
  2. There is obviously (and unsurprisingly) correlation between the different model outputs
  3. There is auto-correlation within each model and, due to 2 above, this auto-correlation is correlated across models.

I had thought of taking a simple sum across the models and dividing by 5 for each distinct output (blue, red or yellow) but on a gut level that just doesn't seem to be the right thing to do.

Another useful set of information I have available is the output of a sixth neural net whose softmax output layer has five classes, where each class output is the "probability" that attaches to each of the above five NN models being the "real" model, given the input features to this sixth NN. This suggests to me that I should be looking at something like maximum likelihood or Bayesian updating, but I'm not sure where I should go from here.

Best Answer

I think I may have found what I was looking for when I originally asked this question. A bit of googlefoo has led me to Linear Opinion Pools and variations thereof. Several papers are available here, here, here, here, here and finally here.

If any forum members have anything else to add, it would be appreciated.

Related Question