I have different N gram language models trained on a corpus. I want to show that increasing N really leads to an improvement in modeling the training set.
I don't understand how to generate random sentences using them?
Please explain mathematically using probability terms and if possible any directions to implement. I assume this sampling from the learnt distribution/ model problem must be well studied, but I am not finding any resources, just a huge number of demonstrations of randomly generated data.
I know that once I start generating words, I can stop until a stop token such as ? or . is generated.
Same problem I have while generating random sentences using character level rnns trained on the same corpus.
Thanks.
Best Answer
First, figure out a way to start, which means have a way to randomly generate $N-1$ words.
Then, at each step of the generation:
Some details left as an exercise ;)