Wikipedia provides the following example when describing feature hashing; but the mapping does not seem consistent with the dictionary defined
For example, to
should be converted to 3
according to the dictionary, but it is encoded as 1
instead.
Is there an error in the description? How does feature hashing work?
The texts:
John likes to watch movies. Mary likes too. John also likes to watch football games.
can be converted, using the dictionary
{"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10}
to the matrix
[[1 2 1 1 1 0 0 0 1 1] [1 1 1 1 0 1 1 1 0 0]]
Best Answer
The matrix is constructed in the following way:
and every entry matrix(i,j)=k means:
In line i, the word with index j appears k times.
So
to
is mapped to index 3. It appears exactly one time in line 1. So m(1,3)=1.More examples
likes
is mapped to index 2. It appears exactly two times in the first line. So m(1,2)=2also
is mapped to index 6. It does not appear in line 1, but one time in line 2. So m(1,6)=0 and m(2,6)=1.