Solved – Neural network: two output vectors

classificationconv-neural-networkdistributionsmultiple-comparisonsneural networks

Architecture:

I have a CNN which does some classification for me. The output layer y consists of a vector $\vec{y}$ which is of dimension $(1, 1000)$, so it has 1.000 neurons in total (the weight matrix $W_{out}^{5}$ between the last fully-connected layer (layer 5) and the output layer is of dimension $(500, 1000)$, since layer 5 has 500 neurons).

Instead of a one-hot encoding my teacher signal is made of a gaussian distribution. So, e.g., when the real value would be class 500 (so the center of vector $\vec{y_{real}}$) a gaussian distribution with a mean/location of 500 is fit into $\vec{y_{real}}$ and fed to the network as the teacher signal. Plotted the teacher signal looks like this then: gaussian distribution for a real value with class 500.

This works quite well and makes sense, since all 1000 classes are related to each other – their information share one "abstract category". Additionally I want to get information about the noise in the current input…so I interpret the shape and variance of the distribution in my output layer ($\vec{y}$) as information about my process' noise.

To give another example: if the real value is 500 I am totally happy if the network's output $\vec{y}$ looks somehow like a gaussian distribution with a global maximum in range from like 490 to 510.


Scenario: Two different outputs?

However, my network contains information about a second "abstract category" (which has nothing in common with the first category). This leads to my current problem: I want to have the network predict both categories, each classified by the network via outputting (optimally) a gaussian distribution.

What would be an appropriate solution for this scenario?

I thought of altering my output layer to be of dimension $(2, 1000)$ first…but I am not sure if 1000 different classes are appropriate for my second category and additionally I do not know if it makes any sense to have the last fully-connected layer be connected to an output of dimension $(2, 1000)$, especially because the two categories (and therefore the two distributions I want to have) have nothing in common semantically.

My second idea was to have two different output vectors, $\vec{y_1}$ for the first category and $\vec{y_2}$ for the second category…but how would the cost function look like then? I guess it would not make any sense to calculate two different errors for category 1 and 2 and then learning the network with those errors in each epoch?

Any ideas on this topic?

Best Answer

Sure, this is entirely possible and you already suggested the answer: you have two output neurons, both fully connected to the previos layer in the network. How the cost function would look like actually depends on the cost function you use. If it is the squared loss, then the cost function is just the loss of class 1 plus the loss of class 2. Backpropagation also doesn't change too much in this case: you simply add up the derivatives of both "sub-loss" functions.

If good predictions of one output class are harder to achieve than the other, then you should balance your loss function (e.g. 30% for the easy one, 70% for the harder one). These weights then get absorbed into the derivates as well.