A deep dive into how large language models utilise numerical representations to derive meaning and enhance AI-driven applications.
Understanding the Mathematics Behind Large Language Models
In the rapidly evolving world of artificial intelligence (AI), one question underpins the development of powerful tools known as large language models (LLMs): how many numbers is a word worth? This seemingly trivial query leads to the intricate mechanics of LLMs and the deployment of AI-driven applications.
Every LLM provides its own numerical representation of words. Meta’s open-source Llama 3 model, for example, represents words using 4,096 numbers, while a version of OpenAI’s GPT-3 uses 12,288. These numerical strings, known as “embeddings,” appear as incomprehensible chains of digits. However, they play a critical role in forming mathematical relationships between words, crafting what seems to be linguistic meaning.
The concept of word embeddings, where each word is mapped to a unique numerical string, is not new. To model language computationally, researchers start by listing essential features for every word in a dictionary. These features could be akin to attributes in a “20 Questions” game, such as “animal,” “vegetable,” or “object.” Numerical values are then assigned to these features. For instance, “dog” may score high on “furry” but low on “metallic.” The resulting embeddings encapsulate each word’s semantic associations and its relationships with other words.
Initially, researchers manually specified these embeddings, but current methods leverage automation. Neural networks are trained to group words or text fragments (tokens) based on features the network self-determines. One feature might distinguish nouns from verbs, while another might separate words that follow a period from those that do not, as explained by Ellie Pavlick, a computer scientist at Brown University and Google DeepMind.
This automation provides efficiency but also reduces interpretability. Unlike a straightforward game of “20 Questions,” many features encoded in neural network-generated embeddings are not easily decipherable. “It seems to be a grab bag of stuff,” notes Pavlick. The neural network can invent features that facilitate task performance, often in ways that are opaque to humans.
When a neural network is trained for language modelling—predicting the next word in a sequence—the learned embeddings are far from arbitrary. These numerical values align such that words with similar contexts have mathematically similar embeddings. For instance, the embeddings for “dog” and “cat” are closer than those for “dog” and “chair.”
This resemblance in embeddings has dazzled researchers and fuelled perceptions of AI’s “magic.” Famous cases of “word arithmetic”—like “king” minus “man” plus “woman” equalling “queen”—exemplify how LLMs appear to derive meaning from numbers, creating an illusion of knowledge akin to human understanding.
However, this knowledge diverges from traditional dictionary definitions. Instead, it resembles a high-dimensional map where embeddings act as coordinates. Words with similar meanings form clusters, akin to suburbs surrounding a city. Yet, unlike geographical coordinates linked to actual locations, these embeddings only refer to one another. In essence, they are coordinates in an abstract space, devoid of external reference points.
The reason embeddings for “dog” and “cat” are similar lies in linguistic statistics: words frequently used together often share meanings. For instance, in the phrase “I hired a pet sitter to feed my ____,” likely completions include “dog” or “cat” rather than “chair.” This statistical similarity, rather than direct definitions, enables an LLM to make accurate next-word predictions.
Embeddings offer a proxy for semantic meaning that has proven highly effective. This efficiency is a major reason why LLMs now feature prominently in AI development. When the mathematical constructs align with human expectations, they resemble intelligence; otherwise, they are deemed “hallucinations.” For the LLM, there is no distinction—they are simply numerical lists operating within abstract, high-dimensional spaces.
Ellie Pavlick aptly summarises this AI dynamic: while humans use language to connect words with real-world objects or concepts, LLMs use embeddings to model language statistically. This approach has driven LLMs to unprecedented capabilities in understanding and generating human language, despite their purely numerical underpinnings.
Source: Noah Wire Services