My question is : How can I compare Language Model(LM
) score for two sentences with different lengths ?
Probabilities are < 1
, and since LM
scores for a sentence are multiple of probability of bigram or trigram, depending upon it's a bigram or trigram model, the probability of scores of longer sentences will mostly be smaller.
So, how should I normalize the value of scores according to length ?
I am pretty sure, atmost everyone after reading LM
would have had same doubt. But I couldn't find much on internet.
Would appreciate for any leads on this.
Best Answer
As you noticed, it's good idea to have some kind of averaging. Since in LM probabilities get multiplied, geometric average seems like a good fit.
From Speech and Language Processing