Solved – How to build feature vectors from profile data

feature-engineeringmachine learningsvm

I want to build feature vectors from data of my test set, which contains profiles of people.
I always want to compare two profiles to each other.

Thus my features are:
– Same surname ∈ {undefined, yes, no}
– age delta ∈ {undefined, x | x ∈ Z}
– number of same interests ∈ N
– genders ∈ {(male, female), (male, male), (undefined, male), …}
– number of common friends (i think this should be normalized by the total number of friends both profiles have) ∈ N

I want to use this feature vectors labeled with 1, -1 to learn classifying a relation between two profiles with a SVM or k-nearest neighbours. I think I should binarize the feature vectors somehow, but I am not shure what is the best way.

My ideas are:
– Just transform the values into binary representation
– Use One-hot encoding
– Split the gender feature into two features: gender_A, gender_B
– Normalize the common friends value by dividing through the absolute of the difference of number of friends for each profile plus one
– Don't normalize the common friends value, just add more features for #friends_A, #friends_B

What do you think would be the best solution or what could I do instead?
Can anyone help me?

Best Answer

As for the number of common friend i suggest using the Jacard index. Its basically the ratio between shared friends both friends.

Related Question