Solved – How to use images as input in machine learning

image processingmachine learningregression

There are different machine learning algorithms (neural networks, decision trees, support vector machines and so on). Usually we assume that there is a mapping from vectors to nominal or numeric values and we try to use the above mentioned algorithms to learn this unknown mapping (so, we speak about classification and regression).

Now assume that we need to use images as input. It means that we need to learn a model to classify images or to assign numeric value to each image. What an intermediate step should we do in order to be able to use images as inputs in a meaningful way?

Of course we can easily represent an image as a vector (read, green and blue values of each pixel). However, in this case we loose information about vicinity of pixels. For example the pixels that are just one or two steps from each other could be very distant in the vector. Another problem is that we get a very large input vectors (number of components is hight x width x 3). So, we can have a problem in which we use 1000 observations with input vectors having 1 000 000 components.

So, to summarize, what are the techniques for transforming images into meaningful inputs (features vectors in the machine learning language)? Or, in other words, how to apply the machine learning techniques in case when images are used as input.

Best Answer

For computer vision problems ConvNet produces remarkable success at recent years. It is a type of neural network and it doesn't unroll images into vectors as you mention. Hence, it can capture spatial relations between pixels better than fully connected neural network. Also it reduces number of parameters by using the same kernels (group of weights) all over the input space. It also known as LeNet named for Yann LeCun. If you are interested check his convnet papers. Also Alex Krizhevsky's paper on ImageNet at 2009 and consecutive papers describe how Convnet learns and what it learns.

Related Question