[GIS] Distinction between pixel-based and object based classification

classificationmachine learningremote sensing

I am struggling to clearly understand the distinction between pixel-based and object-based classification in the remote sensing domain and am hoping someone from this community can provide insight.

Based on the information I have so far, my current understanding is along these lines:

Pixel-based classification:
Classification is done on a per pixel level, using only the spectral information available for that individual pixel (i.e. values of pixels within the locality are ignored). In this sense each pixel would represent a training example for a classification algorithm, and this training example would be in the form of an n-dimensional vector, where n was the number of spectral bands in the image data. Accordingly the trained classification algorithm would output a class prediction for each individual pixel in an image.

Object-based classification:
Classification is done on a localized group of pixels, taking into account the spatial properties of each pixel as they relate to each other. In this sense a training example for a classification algorithm would consist of a group of pixels, and the trained classification algorithm would accordingly output a class prediction for pixels on a group basis. For a crude example, an image might be partitioned into n segments of equal size, and each segment would then be given a class (i.e. contains object / does not contain object).

Is this thinking accurate regarding the meaning of these terms, or is there something that I have missed?

Best Answer

As far as pixel-based classification is concerned, you are spot on. Each pixel is an n-dimensional vector and will be assigned to some class according to some metric, whether using Support Vector Machines, MLE, some kind of knn classifier, etc.

As far as region based classifiers are concerned, though, there have been huge developments in the last few years, driven by a combination of GPUs, vast amounts of data, the cloud and wide availability of algorithms thanks to the growth of open source (facilitated by github). One of the biggest developments in computer vision/classification has been in convolutional neural networks (CNNs). The convolutional layers "learn" features which might be based on colour, as with traditional pixel-based classifiers, but also create edge detectors and all kinds of other feature extractors that could exist in an region of pixels (hence the convolutional part) that you could never extract from a pixel-based classification. This means they are less likely to mis-classify a pixel in the middle of an area of pixels of some other type -- if you have ever run a classification and got ice in the middle of the Amazon, you will understand this problem.

You then apply a fully connected neural net to the "features" learnt via the convolutions to actually do the classification. One of the other great advantages of CNNs is that they are scale and rotation invariant, as there are usually intermediate layers between the convolution layers and the classification layer that generalize features, using pooling and dropout, to avoid overfitting, and help with the issues around scale and orientation.

There are numerous resources on convolutional neural networks, although the best has to be the Standord class from Andrei Karpathy, who is one of the pioneers of this field, and the entire lecture series is available on youtube.

Sure, there are other ways of dealing with pixel versus area based classification, but this is currently the state of the art approach, and has many applications beyond remote sensing classification, such as machine translation and self-driving cars.

Here is another example of region-based classification, using Open Street Map for tagged training data, including instructions for setting up TensorFlow and running on AWS.

Here is an example using Google Earth Engine of a classifier based on edge detection, in this case for pivot irrigation -- using nothing more than a Gaussian kernel and convolutions, but again, showing the power of region/edge based approaches.

enter image description here

While the superiority of object over pixel-based classfication is fairly widely accepted, here is an interesting article in Remote Sensing Letters assessing the performance of object-based classification.

Finally, an amusing example, just to show that even with regional/convolutional based classifiers, computer vision is still really hard -- fortunately, the smartest people at Google, Facebook, etc, are working on algorithms to be able to determine the difference between dogs, cats, and different breeds of dogs and cats. So, those of use interested in remote sensing can sleep easy at night :D

enter image description here

Related Question