era · present · technologies

How LLMs Work

Two rival mathematicians, a forgotten idea, and the machine that learned to speak

By Esoteric.Love

Updated 7th May 2026

era · present · technologies

The Presenttechnologies~7 min · 1,574 words

EPISTEMOLOGY SCORE

82/100

1 = fake news · 20 = fringe · 50 = debated · 80 = suppressed · 100 = grounded

# Two Men Argued About Whether Machines Can Think. One Quit His Job. The Other Says He's Wrong.

Geoffrey Hinton and Yann LeCun spent forty years building the same thing. Now they can't agree on what they built.

The Claim

Large Language Models do not retrieve stored facts. They compress statistical patterns across hundreds of billions of words into a 70-billion-parameter mathematical object — and generate responses by predicting what comes next. Whether that process constitutes understanding, or the simulation of understanding, is the question that has split the field's two most important figures.

The Men Who Taught Machines to See

In 1986, Geoffrey Hinton co-published a paper that changed the trajectory of computer science. Learning Representations by Back-propagating Errors introduced backpropagation as a practical algorithm for training neural networks. The idea was not entirely new — Paul Werbos had outlined it in his 1974 PhD thesis. But Hinton, with David Rumelhart and Ronald Williams, made it work at scale and put it in front of the right audience.

Backpropagation works like this: a neural network is a layered system of weighted connections. You feed it an input, it produces an output, you measure how wrong it was. The algorithm calculates how much each weight contributed to the error — then adjusts all of them simultaneously, flowing the error signal backwards through the layers. Run this on millions of examples and the network learns patterns no human explicitly programmed.

The mainstream called it a dead end. Logic-based AI — expert systems, symbol manipulation — was the consensus. Hinton kept working in the wilderness for fifteen years.

Yann LeCun arrived from a different direction. A French mathematician, LeCun applied backpropagation to image recognition and in 1989 developed convolutional neural networks. By 1998 his system was reading 10% of all cheques processed in the United States. The field still didn't care.

Three men spent twenty years building AI that everyone agreed was worthless. In 2012 their approach won a computer vision competition by a margin so large the second-place team suspected a mistake.

The Moment Everything Changed

In 2012, a neural network called AlexNet entered the ImageNet competition. The second-place team achieved a 26% error rate. AlexNet achieved 16%. The gap was so large observers assumed a scoring error.

AlexNet ran on consumer graphics cards designed for video games. The gaming industry had accidentally built the hardware deep learning needed. Within two years every major tech company had a deep learning team. Within five years the approach had spread to language, speech, protein folding, and drug discovery.

Hinton, LeCun, and Yoshua Bengio won the 2018 Turing Award — computing's Nobel Prize — for this work. Forty years of underfunded, ridiculed research had quietly become the foundation of everything.

What an LLM Actually Is

A Large Language Model is not a database. It does not look things up. It is a function — a massive learned mathematical transformation that takes a sequence of tokens and predicts what comes next.

The architecture is called a transformer. Introduced in the 2017 Google paper Attention Is All You Need, transformers replaced sequential processing with self-attention — a mechanism that lets every token in a sequence attend to every other token simultaneously, weighted by relevance.

Self-attention works like this: each token becomes three vectors — a query, a key, and a value. Queries are what a token is looking for. Keys are what a token advertises about itself. The dot product of a query against all keys produces attention scores — how relevant each other token is. Those scores weight the values, producing a new representation that has absorbed context from the whole sequence.

Stack sixteen of these layers. Add feed-forward networks between them. Train on essentially the entire internet. What emerges has compressed the statistical structure of human language — its logic, metaphors, factual content, syntax, and contradictions — into floating-point numbers. GPT-4 has an estimated 1.8 trillion parameters. Each is a weight adjusted through billions of gradient descent steps.

GPT-4 has roughly 1.8 trillion parameters. The human brain has roughly 100 trillion synapses. The comparison is less informative than it sounds — we don't know what either is actually doing.

The Schism

In May 2023, Geoffrey Hinton resigned from Google. He wanted to speak freely about AI risk without a corporate conflict of interest.

His position: AI systems may already have subjective experience. They may become more intelligent than humans. Development has outpaced safety research by a decade. He regrets his life's work.

Yann LeCun — Chief AI Scientist at Meta — has been categorical in his disagreement. Current AI systems are not intelligent. They are sophisticated autocomplete. They cannot reason, plan, or model the world. The existential risk narrative is not just wrong — it actively harms the field.

The debate has been conducted in public, on social media, in lectures, in papers. It is not polite. LeCun has called Hinton's warnings "preposterous." Hinton has said LeCun is in denial. Yoshua Bengio — the third Turing laureate, and the quietest — has aligned with Hinton, advocating safety research.

What makes this unusual is that both men understand the technical details better than almost anyone alive. This is not a scientist debating a journalist. It is two architects disagreeing about the building they constructed.

What Language Has to Do With It

In 1950, Alan Turing proposed his test: if you can't tell the difference between a human and a machine in conversation, the machine is intelligent. Practical, not philosophical. It didn't ask what intelligence is — only whether we could detect its absence.

John Searle's 1980 Chinese Room attacked this directly. A person locked in a room who speaks no Chinese receives symbols, consults a rulebook, returns correct symbols. To observers outside, the room speaks Chinese. To the person inside: no understanding, only rule-following.

LLMs are the Chinese Room at scale. They process symbols. Whether symbol-processing can constitute understanding is the question Wittgenstein spent his career on — and concluded that meaning is not contained in symbols, but in their use within forms of life. An LLM has read every form of life ever written about. Whether that counts remains unresolved.

Noam Chomsky's position is harder: LLMs don't learn language. They learn surface statistics. His 1957 argument that language has a deep generative structure would predict LLMs should fail at genuine linguistic competence. They often do. They also often don't. The argument may be correct and irrelevant simultaneously.

The Chinese Room processes symbols without understanding. You are reading symbols right now. What is the difference?

The Corporate Alignment Nobody Discusses

The Hinton-LeCun split is framed as scientific. It may also be structural.

Hinton left Google to speak freely. He now advocates for regulation and slowing development. LeCun works for Meta, which has an open-source AI strategy dependent on rapid deployment and minimal regulation. Bengio is at a Canadian academic institution with no commercial interest in either outcome — and sides with Hinton.

The three men who built the foundation of modern AI have aligned, almost perfectly, with the interests of their institutions. Hinton (no employer): slow down. LeCun (Meta): full speed. Bengio (academic): safety first.

Whether this is coincidence is worth sitting with.

There is a second layer. The 2017 transformer paper had eight authors. Most left Google to found or join AI companies. The architecture they designed underlies every major AI product. The mathematical insight enabling the current AI wave was produced inside a corporation, published as open research, and monetised by that corporation and its competitors. The researchers who built the foundations over forty years of underfunded work now advise corporations worth trillions.

The technology was built in universities. The value was captured by shareholders.

What It Means That It Works

The most unsettling fact about LLMs is not that they might be conscious. It is that nobody fully understands why they work.

The scaling hypothesis — more parameters and more data predictably produce better capabilities — was established empirically before it was theoretically explained. Capabilities appeared that nobody predicted: chain-of-thought reasoning, few-shot learning, in-context arithmetic. Researchers call these emergent abilities. They emerged from the mathematics without being designed.

Hinton's concern is not science fiction. It is the observation that we have built something that improves faster than our understanding of it improves. The gap between "it works" and "we know why it works" is wider than in any previous technology.

LeCun's counter: current systems are fundamentally limited. They model language, not the world. Genuine intelligence requires architectures that form internal models of physical reality, plan, and reason causally. The path to dangerous superintelligence does not run through GPT.

Both positions might be right. Both might be wrong. The honest answer is that nobody knows, the people who should know disagree, and the systems are deployed to billions of users while the argument continues.

The Questions That Remain

→

- If an LLM passes every test of linguistic competence, does the absence of subjective understanding matter?

→

- Hinton, LeCun, and Bengio's positions align with their institutions' interests. Does that invalidate their arguments?

→

- Backpropagation adjusts billions of weights based on error signals. Is this meaningfully different from how biological brains learn?

→

- Emergent capabilities appeared without being designed. Can we be confident we know what else might emerge?

→

- The 2017 transformer paper was published openly. Should foundational research that creates trillion-dollar industries be owned by anyone?