Solved – Neural network non binary output

classificationmachine learningneural networks

So Im trying to make a neural network that learns a pattern and outputs another number from the sequence. for example : My first test was with factorials. I had an array of numbers as my input, with labels that were the factorials of those numbers. I was doing this off of code from a tutorial. However, I realised this tutorial was for classification with binary output. What kind of neural network supports non binary classification?

Best Answer

Neural networks can learn to solve $c$-class classification problems, where $c$ is the number of classes (categories) to be discriminated.

The general goal is to categorize a set of patterns or feature vectors, into one of $c$ classes. The true class membership of each pattern is considered uncertain. Feed-forward neural networks learn to perform statistical classification, where the feature distributions overlap, for the different classes. In case the number of classes is three, $c=3$, you train with indicator vectors (Target = [1 0 0]',Target = [0 1 0]' and Target = [0 0 1]', where "`" indicates vector transpose), for patterns belonging to each of the three categories. The neural network learns the probabilities of the three classes, $P(\omega_i \mid {\boldsymbol x})$, $i=1,\ldots,c$.

The prior class distribution is given from the training set, ${\hat P}(\omega_i)$, $i=1,\ldots,c$, the fraction of training patterns belonging to each category.

In the annotation of Duda & Hart [Duda R.O. & Hart P.E. (1973) Pattern Classification and Scene Analysis, Wiley], define the feature distributions provided as input vector to the feed-forward neural network by $P({\boldsymbol x}\,\mid\,\omega_i)$, where for example the data vector equals ${\boldsymbol x}=(0.2,10.2,0,2)$, for a classification task with 4 real-valued feature variables. The index $i$ indicates the possible $c$ classes, $i \in \{1,\ldots,c\}$, and $\omega_1,\omega_2,\ldots,\omega_c$.

The feed-forward neural network classifier learns the posterior probabilities, ${\hat P}(\omega_i\,\mid\,{\boldsymbol x})$, when trained by gradient descent. This is the major result proved by Richard & Lippmann in 1991. The hat over the posterior probability indicates the uncertainty as the probabilities are estimated (learned): $$ {\hat P}(\omega_i\,\mid\,{\boldsymbol x}) = \frac{{\hat P}(\omega_i) \; {\hat P}({\boldsymbol x},\mid\,\omega_i)}{\sum_{i=1}^c {\hat P}(\omega_i) \; {\hat P}({\boldsymbol x},\mid\,\omega_i)} $$

Reference:

Michael D. Richard and Richard P. Lippmann. "Neural Network Classifiers Estimate Bayesian a posteriori Probabilities," Neural Computation, Vol. 3, No. 4,pp. 461-483, 1991.