How to Plot Decision Boundary of a k-Nearest Neighbor Classifier

data visualizationk nearest neighbourr

I want to generate the plot described in the book ElemStatLearn "The Elements of
Statistical Learning: Data Mining, Inference, and Prediction. Second Edition" by Trevor Hastie
& Robert Tibshirani& Jerome Friedman. The plot is:

enter image description here

I am wondering how I can produce this exact graph in R, particularly note the grid graphics and calculation to show the boundary.

Best Answer

To reproduce this figure, you need to have the ElemStatLearn package installed on you system. The artificial dataset was generated with mixture.example() as pointed out by @StasK.

library(ElemStatLearn)
require(class)
x <- mixture.example$x
g <- mixture.example$y
xnew <- mixture.example$xnew
mod15 <- knn(x, xnew, g, k=15, prob=TRUE)
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
px1 <- mixture.example$px1
px2 <- mixture.example$px2
prob15 <- matrix(prob, length(px1), length(px2))
par(mar=rep(2,4))
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
        "15-nearest neighbour", axes=FALSE)
points(x, col=ifelse(g==1, "coral", "cornflowerblue"))
gd <- expand.grid(x=px1, y=px2)
points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))
box()

All but the last three commands come from the on-line help for mixture.example. Note that we used the fact that expand.grid will arrange its output by varying x first, which further allows to index (by column) colors in the prob15 matrix (of dimension 69x99), which holds the proportion of the votes for the winning class for each lattice coordinates (px1,px2).

enter image description here