Tuesday, February 14, 2017

Bag of Words

Conceptually, we can view bag-of-word model as a special case of the n-gram model, with n=1. 


TF & TF-IDF belong to Vector Space Model

Bag-of-words: For a given document, you extract only the unigram words (aka terms) to create an unordered list of words. No POS tag, no syntax, no semantics, no position, no bigrams, no trigrams. Only the unigram words themselves, making for a bunch of words to represent the document. Thus: Bag-of-words.


We do not consider the order of words in a document. Represented the same way: John is quicker than Mary Mary is quicker than John This is called a bag of words model

Vector Space Model doesn't consider ordering as well.

No comments:

Post a Comment