Solved – How to convert multiple ranking scores into a probability distribution

data miningdistributionsensemble learningmachine learningprobability

I would like to create a topic distribution for a document.

The current model I am trying to implement is: for each sentence in the document, I am getting a topic assignment with a score, e.g. "1st sentence is about Microsoft with a relevance score of 0.4". I repeat this for each sentence, and at the end I have relevance scores with the topics like the following:

1st sentence: microsoft, score 0.4

2nd sentence: apple, score: 0.1

3rd sentence: android, score: 0.5

Now, I would like to convert these scores into 1 big probability distribution that will represent the whole document. Is there a known technique to do this? If so, what is the best way to do this?

Note: I know this is a very naive topic modelling, but I am currently interested in combining the scores into prob. distributions.

Best Answer

Are the original scores already probabilities? Then the obvious choice would be Bayes' Rule.

Otherwise, you might want to look at:

While they focus on outlier scores, it should work for other domains, too. They also do some ensemble work there, and that probably is another field of literature where you should look for references. Because essentially, you are doing an ensemble.

Related Question