Solved – Is it effective to use one hot encoding of categorical data as input to PCA for anomaly detection

anomaly detectioncategorical-encodingcorrespondence-analysisdimensionality reductionpca

I have a mixture of numeric and categorical inputs, the categorical inputs are relatively low cardinality (perhaps 10-15).

I want to use PCA for anomaly detection, but am not sure how best to encode the categorical attributes.

Will one hot encoding work, and if not, what should I try?

Best Answer

I would not try pca, but rather correspondence analysis (or some its generalization to mixed categorical/continuous data.) See Can principal component analysis be applied to datasets containing a mix of continuous and categorical variables? for ideas and linkt to generalizations such as homogeneity analysis.