Solved – How tomplement a Neural Network for **Multidimensional** Input Classification

classificationmachine learningneural networks

I'm trying to build a neural network which can classify inputs with multiple dimensions. This is different from problems like image classification where the input is multidimensional, and the input matrix may be flattened and fed to the network. In this problem, each dimension of the input describes a different property.

For example, let's say that the inputs to the network are Gaussian distributions where each input can be described using a (mean, variance) tuple. In contrast to image classification where all the elements of the matrix are intensities of the pixels, in these problems, each of the elements of the tuples describe a different dimension/property of the problem which is not related to the other. Can these properties be mixed/flattened and applied to a single neural network. Can neural networks solve problems like this? If yes, how?

Another example:
A solution to MNIST handwritten digits classification is to flatten the input image to a vector of 784 points and apply it to a neural network. Assume that the problem is slightly modified and instead of intensity of a pixel, its mean and variance are provided. The resulting flattened matrix will be a 2 x 784 matrix instead of a vector, but the output is still a single digit from 0 to 9. How can I apply this input matrix to a neural network while I know that each dimension is describing a different property?

Best Answer

Well, the whole idea of machine learning is that you let the algorithm decide which inputs are important and which aren't. At times, we might want to restrict the algorithm to focus on combining inputs in certain ways. An example for images is convolutional neural nets. Here we use the fact that patterns in images are based on the values of near-by pixels.
What you could do for example is first feed the inputs in one dimension into sub-nets and then combine the outputs of those nets into a final net. For example first feed all the means into one net and the variances into another and then combine the outputs of those nets into a final net.
However you could also feed all properties of each pixel into subnets and then feed the outputs of each pixel into a bigger net. In this case, it isn't obvious what would give better results and why.
So my advice would be that if you have obvious structure like in images, you might make use of it like in convolutional neural nets. However if it isn't obvious, just let the network figure it out. That is what it's for.

Related Question