In NLP, what is the difference between corpus and vocabulary?
I see these words often referred to and I feel like they are referring to the same thing. Is there a difference between them or are they the same thing?
natural language
In NLP, what is the difference between corpus and vocabulary?
I see these words often referred to and I feel like they are referring to the same thing. Is there a difference between them or are they the same thing?
Best Answer
They are not the same thing.
It might be easier to explain by example: BERT is an advanced NLP model trained on the entire content of Wikipedia (originally the English language Wikipedia). The corpus is the collection of Wikipedia articles it was trained on. The vocabulary is the vocabulary of the English language.