Kernel Methods – Applying Kernel Methods to Categorical Data

categorical datakernel trick

I have a basic understanding of kernel methods and the kernel-trick and the advantages of it, why it is preferred over conventional machine learning algorithms etc. However, I have some trouble using them.
The problems I face are as follows,

1. can I use a kernel metric for (dissimilarity) calculation?

2. what steps need to be taken for using a kernel method (say, using the gaussian kernel) on a set of categorical (along with numerical) data. consider the following sample data

Age State Day

12 NJ Tue

24 NM Wed

89 CA Thu

. . .

. . .

The question is do I need to explicitly encode the categorical values in order to use it on the gaussian kernel for similarity calculation?

Best Answer

You have several options, the best one being using a kernel function that is tailored to your specific problem. This is also the least intuitive option, so I won't elaborate on that.

Usually, categorical data are encoded using so-called one-hot encoding. This means we introduce one binary feature for every category. For instance, assume we have 3 categories $A$, $B$ and $C$: $$ A \rightarrow [1, 0 ,0],\quad B\rightarrow[0,1,0],\quad C\rightarrow[0,0,1] $$ The crucial thing is that we must ensure that all pairwise distances are equal, otherwise the distance will be biased towards some pairs (unless, ofcourse, this is what you want). One-hot encoding is the simplest way to ensure that all pairwise distances between categories are equal.