a.k.a. Word-Word matrices or Co-occurrence vectors
Process
- Requires a large volume of data
 - basic preprocessing steps: Tokenization, Lemmatization, etc
 - count number of times word u appears with word v
 - meaning of a word u is the vector of counts (named word vector)
- meaning(u) = [count(u,v1), count(u,v2), …]
 
 
We get,
- A matrix X n × m where n = |V| (target words) and m = |Vc | (context words)
- usually a square matrix
 
 - context window of ±k words (to the left & right)
 
Pros
- compute similarities between words using cosine
 - visualize words
 - dimensions are meaningful, Explainable AI
 
Cons
- cannot capture semantics beyond words
 - Distributional Semantics may not capture entire semantics
 - vectors are sparse, high dimensional
- use dimensionality reduction techniques like Latent Semantic Analysis (LSA)