Solved – Can Neural Network take multidimensional inputs

conv-neural-networkmachine learningnatural languageneural networksrecurrent neural network

I recently learned RNN and find that a common feature of it and CNN is that they use either a LSTM cell or Convolution to process a single multidimensional input (like images and word embeddings which size might be m*n) to a single number(assuming batch size = 1) and then use softmax or feed it into a neural network to get the final result. So I search for some cases of how a pure Neural Network can process multidimensional data like a image or a sentence. A classical way for image processing in a neural network is first flatten a 2D inputs to a vector (if an image is 64*64 then the size of vector is 4096) and this vector is going to be feed into a neural network which means at this time a single input becomes a number instead of a 2D matrix.

So my question is what if we don't flatten it at the beginning and just directly feed into a Neural Network (Fully Connected)? Is it possible? Is the operation between weights and inputs in Neural Network going to become element-wise like what it is in RNN?

The reason I ask this question is that from my point of view the position of each element in a 2D matrix might indicates some special information that you sometimes don't want to ignore, for example, no one flatten a word embedding matrix and directly feed it into a Neural Network for text classification tasks.

Thank you!

Best Answer

Yes, preserving this spatial structure is exactly what makes convolutional neural networks so good at their job.

Is the operation between weights and inputs in Neural Network going to become element-wise like what it is in RNN?

I'm not clear on exactly what you mean by element-wise in this context, but the weights used in a CNN are shared across spatial locations of the input, inducing some sort of translation-invariance/equivariance prior.