[GIS] Random Forest land cover classification in ArcMap

arcgis-desktopclassificationdigital image processingrandom forestremote sensing

I am an undergraduate student working with a very large data set (to me), and who has NO experience with R, but is fairly comfortable with ArcMap.

I have about 60GB of 8band 1m satellite imagery of African rainforest, and a shapefile of about 90 GPS points taken by park rangers on the ground. The GPS shapefile contains a column with the land cover type at each point. There are 9 different land-cover types, which are not easily distinguished by eye from the imagery. My goal is to create a map showing the distribution of these cover types over the entire scene.

Can anyone provide me with near-step-by-step instructions on how to use Duke's Marine Geospatial Ecology Tools (MGET), or potentially any other free Arc add-in, to run a Random Forest classification? Note that I have absolutely no idea how to use R, which seems to be the most-referenced way to process this information.

I know that running a simpler form of classification would be simpler, but since this is not just a typical undergraduate arbitrary product for an assignment (It will actually end up being used by several NGOs working with wildlife in the area), I would like to produce the most accurate product that I can, and a Random Forest model seems to be the most appropriate for this goal.

Best Answer

I am the main developer of MGET.

The first step in your problem is to obtain values of the covariates that you will use to fit the model to your 90 GPS points. It sounds like you want to use the 8 bands as your covariates. You need to add 8 fields to your shapefile (one for each band) and populate them using a tool such as Extract Multi Values to Points from recent versions of ArcGIS or Interpolate Raster Values at Points from MGET (equivalent to what Arc provides but developed before the Arc tool existed).

After that, you need to fit a classification model to the GPS points, using the field containing the known cover type as the response variable and the 8 band fields as the covariates (a.k.a. predictor variables). After that you can obtain some performance statistics for your model and then predict it on rasters representing the covariates.

You can see a basic overview of MGET's modeling workflow for this here. The example is somewhat dated--not all of the tool parameters will look exactly like what you see there--but the basic workflow is the same: fit the model to a table of data, predict it against the table to get some performance statistics, and predict it on a stack of rasters. In MGET, the procedure is the same regardless of which modeling framework you use--MGET currently provides GLM, GAM, trees (a.k.a. CARTs), and random forest--so you can try different kinds of models with very similar workflows.

I'm sorry I don't have detailed instructions about this workflow written up. So far, we have not had funding to develop a complete manual. All MGET tools have documentation within ArcGIS, so be sure you click the Show Help >> button on the tool dialogs if you have not done so already.

Regarding Jeffrey Evans' speculation that MGET does not utilize the R raster package. That is correct. The code in MGET that performs raster predictions was developed before the raster package was released to CRAN (R's distribution system for R packages), thus it does not rely on that package. But it is not correct that MGET will crash due to memory limitations. MGET's raster prediction code was written specifically to handle the situation you're facing, by performing predictions in blocks similar to how the raster package does it. Prior to the raster package being developed, MGET was one of the only readily-available tools that could handle prediction of large rasters. MGET users have done this, for example, using large bathymetry rasters with 5m resolution.

All of that said, if you believe you will be performing a lot of modeling as your career progresses, I encourage you to learn how to do it in R directly, and about modeling and statistics more generally, independent of software. In a sense, MGET's modeling tools are a "gateway drug" to R. MGET's tools are just as robust as R--they utilize R to perform the actual model fitting and prediction--but they expose only a limited subset of what is possible in R itself. As you continue to do more modeling projects, eventually you may face a situation in which MGET is not enough and you need the full flexibility of R.

Related Question