Solved – What data mining/machine learning approach to use for a scoring model

data miningmachine learning

Suppose I have a large data set with lots of features(attributes). And I'm tasked to build some kind of scoring model to rank certain objects with all these features.
How do I go about doing this?

From my understanding so far, I like to think of this as a supervised learning problem. But the problem is there is NO labeled classes (or at least it's not apparent).
How can I rank order these objects? The closest thing I can think of is credit scores, but in credit scoring models, one supposedly has labeled classes as to who historically was good and bad.

Should I invent/create some metric based on the list of attributes and use them as labeled cases?
Like if attribute$_1 > x$ and attribute$_2< y$ etc., then it's considered "good"

I believe they want a numerical ranking (i.e., scoring all the objects have numerical scores assigned to the objects like credit scores). If that's the case, then do I even need machine learning/data mining? Can't I just rank it by these attributes once they agree what the ordering means?

Best Answer

If you have neither labels nor ranking examples, I don't know what you could do with your data other than clustering it based on similarity. The ranking function that you are supposed to learn can be a user's preferences (e.g. when I type "learning" in a search engine I prefer "machine learning" results rather than "e-learning"), a risk score for a bank (i.e. you would not be modeling the clients preferences, but the bank's), etc. That is, the set of possible rankings is $N!$, and there is not a universally good one.

In ranking, you usually have some examples of ordered objects. The task is to learn a ranking function that can be:

  • point-wise: you learn to score every item based on its attributes. The score is used for the final sort.

  • pair-wise: you learn to sort in pairs. You have examples like $A \succ B$, and then your function learns to make pair-wise decisions. Since if you put all the pairs together, you will probably have inconsistencies (e.g. $A \succ B, B \succ C, C \succ A$ ) it is your task to create a final maximal consistent ranking from these pairs.

  • list-wise: you try to learn a ranking function whose output will be a final list.

Point-wise and pair-wise are the most common ones since it is easier to rank locally rather than all items at once (list-wise).

The pointer to all this is "Learning to rank" (Information Retrieval) or "Preference Learning".