Solved – How to implement knn in r with missing values

categorical datadata miningr

I have this data set from https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.names which gives a good summary of the attributes im using. Some of the observations are missing and I have already coded the last target column (+,-'s) in 0's and 1's. I am not sure how to proceed with KNN from here with the missing values and some attributes that have 10-15 different categories. I don't think Knn works with missing values because I keep getting errors saying NA's forced by coercion. Should I remove these attributes as well as trying to impute the missing numerical values?

Best Answer

You should convert you categoricals to onehot encoding and thus use a custom distance metric. Regarding the na's, yes you should create a missing value imputer; a common approach is to replace the value with the mean or median on that column. But you can create your own by taking into account the insights you took from your data, using trees for imputation is also common.

Related Question