Conceptually, we can view bag-of-word model as a special case of the n-gram model, with n=1.
TF & TF-IDF belong to Vector Space Model
Bag-of-words: For a given document, you extract only the unigram words (aka terms) to create an unordered list of words. No POS tag, no syntax, no semantics, no position, no bigrams, no trigrams. Only the unigram words themselves, making for a bunch of words to represent the document. Thus: Bag-of-words.
We do not consider the order of words in a document. Represented the same way: John is quicker than Mary Mary is quicker than John This is called a bag of words model
Vector Space Model doesn't consider ordering as well.
TF & TF-IDF belong to Vector Space Model
Bag-of-words: For a given document, you extract only the unigram words (aka terms) to create an unordered list of words. No POS tag, no syntax, no semantics, no position, no bigrams, no trigrams. Only the unigram words themselves, making for a bunch of words to represent the document. Thus: Bag-of-words.
We do not consider the order of words in a document. Represented the same way: John is quicker than Mary Mary is quicker than John This is called a bag of words model
Vector Space Model doesn't consider ordering as well.
No comments:
Post a Comment