- Word2Vec
- SKipgram
- CBow
- Glove
- Vector semantics are used to measure the similarity of words
- It's a mathematical model to represent the universe of entities and the measure of similaties between the entities
- Applications: Plagurism Detectors
- 2 words are similar if they have same word contexts
- Sparse vector representations
-
- Mutual-information weighted word co-occurrence matrices
-
- Dense vector representation
- Singular Value Decomposition
- Neural Network Model (skipgram and CBOW)
- Brown Clusters
- Term-Document matrix : number of times a word occurs in a document
- Term-Term matrix : number of times a word occurs with another word
- Instead of making vectors for entire documents, could be made for smaller size corpus as well such as para. window of +- 4 words.
- But these matrix's are very sparse
- The shorter the window size the more syntactic is the representation
- The longer the window size the more semantic the representation
- Frequency Methods are very skwed
- Rather have a measure that ask wether a context word is particularly informative about the target word.
- Ranges from -inf to +inf
- PMI is biased towards infrequent words, a solution is to give rare words slightly higher frequency
- Dot product is a good method, it gives a higher value if both have high values in the same dimension
- PPMI vectors are short and sparse
- Alternately we could learn vectors which are short and dense
- Shorter vectors are easier to use as features in machine
- Singular Value Decomposition
- Neural Network Method (skipGram, Cbow)
- Brown Clustering
- Learning Embedding as a part of a word prediction process
- Predict each neighbouring word in context of a window of 2C words
- from the current word -c to +c
- Each word is expressed as a one hot vector of all the words in the document
- Input Embedding v, in the input matrix W, coloum i is 1Xd embedding v for word i in the vocab
- Output Embedding v', row i of the matrix W' id a dX1 vector embedding for the word i in the vocab
- Predicting the context given the word
- works well with small amount of data, represents well even the rare words or phrases
- Embeddings capture relational meaning vec(king) - vec(man) + vec(woman) = vec(queen)
- The task in hand is to predict the word given it's context
- Several times faster, slightly better frequeny with frequent words