[GIS] Using supervised vs unsupervised classification in identification of region of interest

land-covermachine learningremote sensing

I was introduced to machine learning and remote sensing recently.
My task was to classify the satellite images into vegetation and non vegetation.
We were introduced to two approaches.

  1. Supervised learning – where we had wkt or geojson files made from ground truth. These files had polygons which were used to train the model.
    satellite images from WorldView-3 Satellite Sensor
  2. Unsupervised classification where the pixels were classified based on NDVI values using clustering models such as K-means, Fuzzy C-means clustering.
    satellite images from landsat 8

While all of these things were virtually spoon fed and I took the code samples from here and there. I still fail to understand which method is used where, specifically with context of crop forecasting.

What is the advantage of collecting the ground truth, when we can use the unsupervised learning to classify the images?

If it is about accuracy, then are there any specific examples as to how ground truth helps in accuracy in crop forecasting?

Best Answer

Both supervised and unsupervised classification methods require some degree of knowledge of the area of interest. Most important are 1) the quality of the spectral data in which the classification algorithm is to be used and 2) the level of class detail required.

Unsupervised classification algorithms require the analyst to assign labels and combine classes after the fact into useful information classes (e.g. forest, agricultural, water, etc). In many cases, this after the fact assignment of spectral clusters is difficult or not possible because these clusters contain assemblages of mixed land cover types. Generally speaking, unsupervised classification is useful for quickly assigning labels to uncomplicated, broad land cover classes such as water, vegetation/non-vegetation, forested/non-forested, etc). Furthermore, unsupervised classification may reduce analyst bias.

Supervised classification allows the analyst to fine tune the information classes--often to much finer subcategories, such as species level classes. Training data is collected in the field with high accuracy GPS devices or expertly selected on the computer. Consider for example if you wished to classify percent crop damage in corn fields. A supervised approach would be highly suited to this type of problem because you could directly measure the percent damage in the field and use these data to train the classification algorithm. Using training data on the result of an unsupervised classification would likely yield more error because the spectral classes would contain more mixed pixels than the supervised approach. Similarly, collecting in the field crop species training data is preferable to expertly selecting pixels on screen as it is often very difficult to determine which crops are growing visually.

I highly recommend reviewing research from Dr. Russell Congalton, who has produced many landmark accuracy assessment papers pertaining to remote sensing classification approaches. Here are some references to get you started:

  1. Congalton, R. G. (1991). A review of assessing the accuracy of classifications of remotely sensed data. Remote sensing of environment, 37(1), 35-46.
  2. Congalton, R. G., & Green, K. (2008). Assessing the accuracy of remotely sensed data: principles and practices. CRC press.
Related Question