Solved – What machine learning algorithm should I choose to fill in blanks from context

algorithmsmachine learningnatural language

I have a project where I need to be able to fill in a missing word given a few words of context. In other words, suppose I have a sentence:

I went ____ the store.

I want to be able to deduce that the blank above is "to". I want to be able to train this and run it against a small corpus of about 40,000 words. Given the small corpus size and the problem, what is the best algorithm to choose?

I'm ok with not matching most words. I'm more interested in determining how many words such an algorithm can match than actually being able to match with precise accuracy.

Best Answer

One option is to make a Language Model with a corpus of many example sentences, then for each word in your dictionary compute the Perplexity of the sentence with that word in the blank spot. The ones with the lowest perplexities are the ones that are the best words to fill in the blank with.

I needed to do the same thing and with a big enough corpus this works pretty well for me.

Related Question