Solved – Best way to visualize KNN for more than 4 variables in R

data visualizationinteractive-visualizationk nearest neighbourr

I have used the KNN for a data set containing 9 columns. Using knn() from the class package I found the best model for predicting the value in the 9th column. This model reports the best_model_accuracy as 82.51% and best_model as using 1,2,6,7,8 columns. But I am stuck with regard to visually representing this data.
What is the best way to plot it with so many variables?

Best Answer

I think your question has little to do with KNN. Really, the question is how to best visualize multi-dimensional data. The best ways to do that depend on your data. Are the variables continuous or categorical? How many data points are there? Are there special variables you want to understand? I think that the main impact of KNN in your question is regarding this latter question; you particularly want to understand the predicted variable as it relates to the predictors.

I will give some examples that may help for the case that you are predicting a categorical variable based on several continuous variables using the iris data set. I hope that this may help with your data.

Scatterplot matrix - shows all pairs of predictor dimensions with the category represented by the color of the points. Obviously, when you have many predictor variables, this can be difficult to see, but with a modest number of variables you can get a decent overview.

Parallel Coordinates plot - shows all predictor dimensions simultaneously with category represented by color. With many data points, these can get pretty messy.

Projection: There are many ways to project the data to a "good" two dimensional space. For example, PCA, tSNE and isomap. I will just illustrate with PCA.

Principal Components Analysis - Biplot projects to the first two principal components and color based on the category. This is a linear projection to the two directions that show the most variation in the data.

Best Answer

Related Solutions

Solved – Help understand kNN for multi-dimensional data

Solved – Clustering can be plotted only with more units than variables

Related Question