- 
predicting the context given a word **wt
 - 
Let wt-1,…,wt-m, wt+1,…,wt+m be the context
 - 
Pr(wt | context) * Pr (context) = Pr(context | wt) * Pr(wt)
 - 
Pr(context) and Pr(wt) are uniform distributions and are constants
 - 
Pr(context | wt) = Product { Pr(wj | wt) } for all js
 
Word2Vec is a skip-gram model