I have a mixture of numeric and categorical inputs, the categorical inputs are relatively low cardinality (perhaps 10-15).
I want to use PCA for anomaly detection, but am not sure how best to encode the categorical attributes.
Will one hot encoding work, and if not, what should I try?
Best Answer
I would not try pca, but rather correspondence analysis (or some its generalization to mixed categorical/continuous data.) See Can principal component analysis be applied to datasets containing a mix of continuous and categorical variables? for ideas and linkt to generalizations such as homogeneity analysis.