Natural Language Processing (NLP) has transformed the way machines interact with human language. From search engines and chatbots to recommendation systems and virtual assistants, NLP powers many of the intelligent applications we use every day.
However, one fundamental challenge exists: computers do not understand words the way humans do.
When humans read words such as king, queen, doctor, or hospital, we automatically understand their meanings and relationships. Computers, on the other hand, see text as a sequence of characters without any inherent meaning.
To bridge this gap, researchers developed techniques that convert words into numerical representations while preserving their semantic meaning. These representations are known as word embeddings.
In this article, we will explore Vector Spaces, Word2Vec, and GloVe—three foundational concepts that helped machines move beyond simple text processing toward understanding language more intelligently.
Why Computers Struggle with Text
Computers are designed to process numbers, not words.
Consider the following sentence:
I love machine learning.
A human immediately understands its meaning, but a computer sees only a collection of characters:
['I', 'love', 'machine', 'learning']
The machine does not understand:
- What “love” means
- That “machine learning” is a field of study
- That similar words may share similar meanings
Traditional machine learning algorithms require numerical input.
This creates an important question:
How can we convert words into numbers while preserving their meaning?
From Words to Numbers: The Need for Word Embeddings
Early NLP techniques attempted to solve this problem using methods such as:
One-Hot Encoding
Example vocabulary:
["cat", "dog", "car"]
Representations:
cat = [1, 0, 0]
dog = [0, 1, 0]
car = [0, 0, 1]
While simple, this approach has major limitations:
- High dimensionality
- Sparse vectors
- No semantic relationships
For example:
cat = [1,0,0]
dog = [0,1,0]
To a computer, “cat” and “dog” appear completely unrelated, even though both are animals.
What Are Word Embeddings?
Word embeddings are dense numerical vectors that capture semantic meaning.
Instead of representing words as isolated entities, embeddings place similar words close together in a mathematical space.
Example:
King → [0.72, 0.14, 0.89]
Queen → [0.70, 0.16, 0.87]
Man → [0.60, 0.22, 0.75]
Woman → [0.58, 0.24, 0.73]
Notice how related words receive similar vector representations.
This allows machines to learn relationships between words rather than simply memorizing them.
Understanding Vector Spaces
A vector space is a mathematical environment where words are represented as points.
Instead of viewing words as text, we represent them as vectors.
Imagine a simple 2-dimensional space:
Queen
*
|
|
King *--------*
|
|
*
Woman
In reality, modern embeddings use hundreds of dimensions.
Common dimensions include:
- 100
- 200
- 300
- 768
- 1024
Each dimension captures different characteristics of language.
The most important idea is:
Words with similar meanings appear closer together in vector space.
For example:
Paris → France
Berlin → Germany
King → Queen
Man → Woman
The relationships between these words can be learned mathematically.
Word2Vec
What is Word2Vec?
Word2Vec is one of the most influential word embedding techniques developed by Google in 2013.
Instead of manually defining relationships, Word2Vec learns word representations automatically by analyzing large amounts of text.
The core idea is simple:
Words appearing in similar contexts tend to have similar meanings.
For example:
The cat sits on the mat.
The dog sits on the mat.
Because “cat” and “dog” appear in similar contexts, Word2Vec learns that they are semantically related.
from gensim.models import Word2Vec
sentences = [
[“i”, “love”, “machine”, “learning”],
[“machine”, “learning”, “is”, “fun”],
[“i”, “love”, “python”]
]
model = Word2Vec(
sentences,
vector_size=100,
window=5,
min_count=1,
workers=4
)
print(model.wv[“machine”])
CBOW (Continuous Bag of Words)
CBOW predicts a target word using surrounding words.
Example:
The ____ sits on the mat
Context words:
The, sits, on, the, mat
Target word:
cat
How CBOW Works
- Takes surrounding words as input.
- Learns context patterns.
- Predicts the missing word.
Advantages of CBOW
- Faster training
- Efficient for large datasets
- Works well for common words
Limitations of CBOW
- Less effective for rare words
- May lose detailed contextual information
Skip-Gram
Skip-Gram works opposite to CBOW.
Instead of predicting the center word, it predicts surrounding words.
Example:
Target Word:
cat
Predicted Context:
The
sits
on
mat
How Skip-Gram Works
- Takes one word as input.
- Predicts neighboring words.
- Learns richer semantic relationships.
Advantages of Skip-Gram
- Better for rare words
- Captures semantic relationships effectively
- Produces high-quality embeddings
Limitations of Skip-Gram
- Slower training
- Requires more computation
Advantages of Word2Vec
- Learns semantic relationships automatically
- Produces dense vector representations
- Efficient training process
- Captures meaningful patterns in language
Example:
King - Man + Woman ≈ Queen
This famous example demonstrates how Word2Vec learns relationships mathematically.
Limitations of Word2Vec
- One vector per word
- Cannot handle multiple meanings effectively
Example:
Apple (fruit)
Apple (company)
Both meanings receive the same vector.
- Context awareness is limited
- Requires large training datasets
These limitations motivated the development of improved embedding techniques.
GloVe (Global Vectors for Word Representation)
What is GloVe?
GloVe (Global Vectors for Word Representation) is a word embedding model developed by Stanford University.
Unlike Word2Vec, which primarily learns from local context, GloVe combines both:
- Local context information
- Global word co-occurrence statistics
This helps generate richer word representations.
How GloVe Works
GloVe analyzes how frequently words appear together across an entire corpus.
Example:
ice appears frequently with:
cold
winter
snow
freeze
steam appears frequently with:
hot
water
heat
boil
By examining these relationships globally, GloVe learns meaningful vector representations.
Instead of focusing only on neighboring words, it studies broader language patterns.
import gensim.downloader as api
glove = api.load(“glove-wiki-gigaword-100”)
print(glove[“king”][:10])
[
(‘machines’, 0.82),
(‘computer’, 0.79),
(‘technology’, 0.75)
]
Word2Vec vs GloVe
| Feature | Word2Vec | GloVe |
|---|---|---|
| Learning Method | Predictive | Count-Based |
| Context Usage | Local Context | Global Context |
| Training Speed | Fast | Fast |
| Semantic Relationships | Strong | Strong |
| Statistical Information | Limited | Extensive |
Conclusion
The development of Word2Vec and GloVe marked a significant milestone in Natural Language Processing. By transforming words into meaningful vector representations, these techniques enabled machines to move beyond simple keyword matching and begin understanding relationships between words.
Vector spaces provide a powerful framework where semantic meaning can be represented mathematically, allowing algorithms to identify similarities, analogies, and contextual relationships. Although newer transformer-based models have emerged, Word2Vec and GloVe remain fundamental concepts for understanding how modern NLP evolved.
As we continue our NLP journey, the next step is learning how these embeddings are compared and utilized in real-world systems. This leads naturally to concepts such as Cosine Similarity, Semantic Similarity, and Vector Space Semantics, which form the foundation of modern semantic search and retrieval systems.
Leave a Reply