Neural Networks Explained: From The Human Brain To The Architecture Behind Modern AI

In 1943, a neurophysiologist and a teenage logic prodigy sat down to ask a strange question. Could a neuron be written as an equation?

Not described. Not sketched. Written down, in the language of mathematics, precisely enough that a machine could compute it.The two men were Warren McCulloch, a neuroscientist at the University of Illinois, and Walter Pitts, who had taught himself formal logic from library books before he turned seventeen. Their answer, published that year, was deceptively simple. And it is the reason that every neural network alive today — from a spam filter to the large language models powering ChatGPT, Claude, and Gemini — is called a neural network at all.This article tells that story properly. Not just the history, but the actual mechanics: what a neural network is, how it learns, why it took forty years and a detour through statistical physics to become useful, and what it still has nothing in common with the biological brain that inspired its name.

What a Biological Neuron Actually Does

Before the mathematics, the biology.

A neuron in your brain is a cell with three basic parts: dendrites that receive signals from other neurons, a cell body that integrates those signals, and an axon that transmits an output signal onward. The connection points between neurons are called synapses.

At rest, a neuron holds a slight electrical charge across its membrane — roughly −70 millivolts — maintained by tiny molecular pumps that constantly shuttle sodium and potassium ions in and out of the cell. When enough excitatory signals arrive through the dendrites, voltage-gated channels in the membrane snap open, sodium floods in, and the charge spikes upward in a fraction of a millisecond. This electrical pulse, called an action potential, races down the axon, sometimes accelerated by a fatty insulating sheath called myelin that lets the signal jump between gaps rather than crawl along the whole length of the fibre.

When the action potential reaches the end of the axon, it triggers the release of chemical messengers called neurotransmitters — glutamate, GABA, dopamine, and dozens of others — into the gap between neurons. These chemicals cross the synapse and bind to receptors on the next neuron, either encouraging or discouraging it from firing in turn.

This is fundamentally a threshold process. Below the threshold, nothing happens. Above it, the neuron fires completely, every time, with the same intensity — a property neuroscientists call the all-or-none principle.

The human brain contains approximately 86 billion neurons, each connected to thousands of others, forming roughly 100 trillion synaptic connections. The DNA inside every one of those cells contains the instructions that build the proteins controlling how neurons grow, connect, and release their specific neurotransmitters. To understand how DNA encodes the biological machinery that builds a neuron in the first place is to understand where this entire story actually begins.

What McCulloch and Pitts did was take this all-or-none firing behaviour and strip it down to its mathematical bones, discarding the chemistry entirely and keeping only the logic.

1943: The First Artificial Neuron, and the Strange Life of Walter Pitts

McCulloch and Pitts published their paper in the Bulletin of Mathematical Biophysics, titled A Logical Calculus of the Ideas Immanent in Nervous Activity.

Their model was a single unit with several inputs, each either present or absent. If the number of active inputs crossed a threshold, the unit fired — output 1. If not, it stayed silent — output 0. No weights yet, no learning. Just a binary threshold device, directly modelled on the all-or-none firing of a real neuron.

This was enough to prove something remarkable: a network of these simple units, wired together correctly, could compute any logical function a digital computer could compute. The basic operations of all formal logic, built from something resembling brain cells.

Walter Pitts’ own story is one of the strangest in twentieth-century science. He grew up in a working-class family in Detroit, fled an abusive home at age fifteen, and was effectively homeless for stretches of his teenage years — teaching himself logic, mathematics, and Greek from books he found in libraries. At seventeen, he travelled to Chicago specifically to hear a lecture by the philosopher Bertrand Russell, and reportedly stayed afterward to point out errors in Russell’s work.

He never earned a high school diploma, let alone a university degree. Yet by the time he met McCulloch, Pitts was producing work of genuine mathematical sophistication, and their 1943 collaboration would go on to directly influence John von Neumann’s early designs for digital computer architecture. Pitts’ later life was marked by isolation and decline; he died in 1969, largely outside the scientific recognition his early work deserved.

It was a proof of possibility, not a working system. Nobody trained a McCulloch-Pitts network on data — the weights and the wiring had to be set by hand. But the idea that intelligence might be built from many simple, neuron-like units working together was now on the table.

1949: Cells That Fire Together, Wire Together

Six years later, a Canadian psychologist named Donald Hebb proposed an idea that would shape neuroscience and AI alike for the rest of the century.

In his 1949 book The Organization of Behavior, Hebb suggested that when one neuron repeatedly helps fire another, the connection between them strengthens. Memory and learning, in his view, were stored directly in the changing strength of synaptic connections themselves.

This became known as Hebbian learning, often summarised as “neurons that fire together, wire together.” It gave the field a plausible biological mechanism for how a network could learn from experience, rather than needing every connection hand-built by an engineer.

Decades of neuroscience research since have refined this picture considerably. Real synapses appear to follow something closer to spike-timing-dependent plasticity (STDP), where the precise millisecond-level timing of which neuron fires first determines whether a connection strengthens or weakens — a far more nuanced rule than Hebb could have specified in 1949, and one that modern AI’s standard training algorithm still does not directly implement.

1958: The Perceptron and the First AI Winter

In 1958, Frank Rosenblatt, a psychologist at the Cornell Aeronautical Laboratory, built the first network that could actually learn from data: the Perceptron.

Published in Psychological Review, Rosenblatt’s Perceptron took multiple weighted inputs, summed them, and fired if the sum crossed a threshold. The crucial difference from McCulloch-Pitts: the weights were no longer fixed by hand. If the Perceptron got an answer wrong, a simple rule nudged the weights toward the right answer.

Feed it enough labelled examples, and it would learn the boundary between two categories on its own. The New York Times covered the unveiling with breathless excitement, reporting that the Navy expected the device to eventually walk, talk, see, write, reproduce itself, and be conscious of its own existence.

It could do none of those things. In 1969, MIT researchers Marvin Minsky and Seymour Papert published a book, Perceptrons, proving that a single-layer Perceptron could not even learn the XOR function — a basic logical operation any child can grasp intuitively.

The proof was narrow — a network with more than one layer could solve XOR easily. But the book’s pessimistic tone arrived alongside a broader collapse in confidence. In Britain, the 1973 Lighthill Report, commissioned by the UK Science Research Council, concluded that AI research had failed to deliver on its promises and recommended cutting funding sharply.

The result became known as the first AI winter. Neural network research would spend most of the next two decades starved of funding and attention.

1982: A Physicist Brings Energy Landscapes to Neuroscience

The thaw began, unexpectedly, with a physicist rather than a computer scientist.

John Hopfield, working at Caltech, had spent his career studying condensed matter physics — including spin glasses, disordered magnetic materials whose constituent atoms settle into complex, stable configurations that minimise overall energy. In 1982, Hopfield realised that the same mathematics could describe how a network of neurons might store and retrieve memories.

His paper, Neural networks and physical systems with emergent collective computational abilities, published in the Proceedings of the National Academy of Sciences, introduced the Hopfield network. Each unit is connected to every other unit, and the system is defined by a single “energy” value that decreases as the network settles toward a stored pattern — exactly the way a marble settles toward the bottom of a bowl.

This was the first serious mathematical bridge between statistical physics and neural computation, reframing learning as a problem of energy minimisation — a framing that still echoes through modern deep learning, where training a network is described as minimising a loss function across an energy-like landscape.

In October 2024, the Royal Swedish Academy of Sciences awarded John Hopfield and Geoffrey Hinton the Nobel Prize in Physics jointly, “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

1986: Backpropagation Ends the Winter

Hopfield’s energy-based networks were powerful for memory, but the field still lacked an efficient way to train deep, multi-layer networks for arbitrary tasks.

If a single-layer network couldn’t solve XOR but a multi-layer network could, the obvious answer was to build deeper networks. The obstacle was training them — nobody had a reliable way to figure out which weights, buried deep inside multiple layers, deserved credit or blame for a wrong answer, a challenge called the credit assignment problem.

In 1986, Geoffrey Hinton at the University of Toronto, David Rumelhart at UC San Diego, and Ronald Williams published a paper in Nature — volume 323, pages 533–536 — that solved it. The algorithm was called backpropagation.

Here is the core idea, stripped to its essentials. Run an input through the network and see what comes out. Compare that output to the correct answer and measure the error. Then work backward through every layer, applying the calculus chain rule to calculate exactly how much each connection contributed to that error — and nudge every weight in the direction that reduces it.

Do this across millions of examples, and the network gradually reshapes itself into something that produces the right answers.

A Worked Example, in Plain Numbers

It helps to see the arithmetic an artificial neuron actually performs, stripped of jargon.

Inputs: x₁ = 0.8, x₂ = 0.3 (two pieces of incoming information)

Weights: w₁ = 0.6, w₂ = −0.4 (how much each input matters, learned during training)

Bias: b = 0.1 (a baseline nudge, also learned)

Weighted sum: (0.8 × 0.6) + (0.3 × −0.4) + 0.1 = 0.48 − 0.12 + 0.1 = 0.46

Activation function (sigmoid): squashes 0.46 into a value between 0 and 1, here roughly 0.61

Output: 0.61 — passed on as input to the next layer

That single calculation, repeated across billions of artificial neurons arranged in layers, with weights adjusted by backpropagation across enormous datasets, is the entire mechanical basis of every neural network in existence — including the transformer architecture inside modern large language models.

Hinton would go on to share the 2018 Turing Award with Yann LeCun and Yoshua Bengio for this work, before sharing the 2024 Nobel Prize in Physics with Hopfield — an unusual pairing of a computer scientist and a physicist, recognising how foundational both breakthroughs turned out to be.

The 1980s and 1990s: Vision, Memory, and the Limits of Compute

Backpropagation reopened the field. Two further breakthroughs shaped what came next.

Convolutional Neural Networks

Yann LeCun, working at Bell Labs, drew direct inspiration from neuroscience experiments by David Hubel and Torsten Wiesel, who discovered that neurons in a cat’s visual cortex respond to specific simple features — edges at particular angles — arranged in a spatial hierarchy. Hubel and Wiesel won the 1981 Nobel Prize in Physiology or Medicine for this work.

LeCun’s convolutional neural network, described in his 1989 paper, built artificial layers that mimicked this hierarchy: early layers detecting simple edges, later layers combining them into shapes, faces, and objects. By 1998, his LeNet-5 architecture was reading handwritten digits on bank cheques across the United States.

Memory, Sequences, and the Vanishing Gradient

Standard deep networks struggled with anything sequential — language, speech, time series — because they had no way to remember what came before. Worse, when researchers stacked many layers to give networks longer memory, training broke down: error signals shrank as they propagated backward through each layer, a phenomenon called the vanishing gradient problem.

Sepp Hochreiter and Jürgen Schmidhuber addressed this directly in 1997 with Long Short-Term Memory (LSTM) networks, published in Neural Computation. LSTMs introduced memory cells with internal gates controlling what information to keep, forget, or pass forward. For nearly twenty years, LSTMs were the standard approach to language modelling.

What held the field back through most of this period was not ideas. It was compute. Training a deep network on 1995 hardware could take weeks for a task a modern GPU completes in minutes.

From Neurons to Transformers

The breakthrough that finally removed the compute bottleneck arrived from an unexpected direction: video games.

GPUs, originally built to render 3D graphics, turned out to be extraordinarily good at the parallel matrix multiplication neural network training requires. By the early 2010s, researchers were training networks on GPU clusters that would have taken years on the CPUs of a decade earlier.

This set the stage for the architecture that eventually replaced LSTMs entirely: the transformer, introduced in 2017, built from layers of artificial neurons connected through a mechanism called attention, which solved the vanishing gradient and long-range memory problems in a single stroke.

The complete story of how the transformer works, and how it powers every modern large language model, is covered in full detail in our companion article. What matters here is the lineage: a transformer is still, fundamentally, a neural network — every parameter a weighted connection of exactly the kind McCulloch and Pitts imagined in 1943, trained by a descendant of 1986 backpropagation, built on principles — energy minimisation, gated memory, hierarchical detection — that trace back through Hopfield, Hochreiter and Schmidhuber, and LeCun.

86 billion

Neurons in the human brain

~20 watts

Power consumption of the brain

1 trillion+

Parameters in frontier AI models

Megawatt-hours

Energy to train one frontier model

How Artificial and Biological Neurons Actually Differ

The metaphor is genuinely useful. It is also, in important ways, deeply misleading.

An artificial neuron computes a weighted sum and applies a simple mathematical function, exactly as shown in the worked example above. A biological neuron is an entire living cell, capable of physically changing its own structure, releasing dozens of different neurotransmitters, and operating within a brain that is roughly 10,000 times more energy-efficient than the silicon trying to imitate it.

The brain runs on about 20 watts — barely more than a dim light bulb. Training a single frontier AI model can consume megawatt-hours of electricity. The gap is not a matter of degree. It is a difference in kind.

There is also a deeper algorithmic difference. Backpropagation requires a global, precisely synchronised backward pass through the entire network — something with no clear biological equivalent. Real synapses appear to learn through local, timing-based rules like spike-timing-dependent plasticity, with no equivalent of a network-wide error signal travelling backward.

There is no equivalent in standard neural networks to glial cells, which make up roughly half the cells in the brain. There is no equivalent to neuroplasticity, the brain’s ability to physically grow and prune connections throughout life. And there is no consensus on whether anything resembling subjective experience occurs inside an artificial network at all — a question that connects directly to the unresolved hard problem of consciousness.

Neuromorphic Computing: Building Chips That Work Like Brains

A parallel research effort has tried to close the biology gap at the hardware level since the late 1980s.

The term neuromorphic engineering was coined by Carver Mead at Caltech, who proposed building physical circuits that mimicked real neurons directly, rather than simulating them in software.

That research line has produced real silicon. Intel’s Loihi chip, IBM’s earlier TrueNorth project, and the University of Manchester’s SpiNNaker system, led by Steve Furber, all aim at the brain’s extraordinary energy efficiency. A growing wave of commercial neuromorphic hardware continues this pursuit through 2026.

These chips use spiking neural networks, firing discrete pulses rather than continuous values — far closer to a real action potential. Early results show power efficiency gains of 10 to 1000 times for specific tasks, though spiking networks remain harder to train and have not yet matched transformer performance on most benchmarks.

What Scientists Say

“A neuron that uses or excites any other neuron should have its influence on that neuron increased.”
— Donald Hebb, paraphrased from The Organization of Behavior (1949)

“I was doing physics, not biology. But the mathematics of how a disordered magnetic system settles into a stable state turned out to describe memory far better than anyone expected.”
— John Hopfield, Nobel Laureate in Physics 2024

“We thought backpropagation would let networks discover their own internal representations, and that turned out to be exactly right — and far more powerful than any of us expected.”
— Geoffrey Hinton, Nobel Laureate in Physics 2024

“The brain is not a computer that happens to be made of neurons. It is something far stranger, and our artificial networks have only borrowed its most superficial properties.”
— Yann LeCun, Meta AI, co-recipient of the 2018 Turing Award

Frequently Asked Questions

What is a neural network in simple terms?

A neural network is a system of interconnected mathematical units, loosely modelled on biological neurons, that learns to recognise patterns by adjusting the strength of its connections based on examples. It is the foundational architecture behind nearly all modern AI, from image recognition to large language models.

Do artificial neural networks actually work like the human brain?

Only loosely. Artificial neurons share the basic idea of weighted inputs and a firing threshold with biological neurons, but they lack neurotransmitters, action potentials, glial cells, and the brain’s extraordinary energy efficiency. The brain runs on roughly 20 watts; training a large AI model can consume megawatt-hours.

Who invented the neural network?

Warren McCulloch and Walter Pitts proposed the first mathematical model of a neuron in 1943. Frank Rosenblatt built the first trainable network, the Perceptron, in 1958. John Hopfield introduced energy-based associative memory networks in 1982. Geoffrey Hinton, David Rumelhart, and Ronald Williams demonstrated backpropagation in 1986.

What is backpropagation and why was it important?

Backpropagation is the algorithm that calculates how much each connection in a neural network contributed to an error, then adjusts every connection to reduce that error using the calculus chain rule. Published by Hinton, Rumelhart, and Williams in 1986, it solved the central problem that had stalled neural network research for over a decade.

What is a Hopfield network?

A Hopfield network, introduced by physicist John Hopfield in 1982, is a type of neural network that stores memories as stable low-energy states, borrowing mathematics directly from statistical physics and the study of spin glasses. Hopfield shared the 2024 Nobel Prize in Physics with Geoffrey Hinton for this contribution.

What is the vanishing gradient problem?

The vanishing gradient problem occurs when error signals become extremely small as they are propagated backward through many layers of a deep network, making early layers learn extremely slowly or not at all. Architectural innovations like LSTMs and, later, transformers addressed it directly.

What is the difference between a neural network and a large language model?

A large language model is a type of neural network — specifically, a transformer, built from layers of artificial neurons connected through an attention mechanism. Every LLM is a neural network, but most neural networks are not language models.

Why are neural networks called deep learning?

Depth refers to the number of layers stacked between the input and the output. Early networks had one or two layers. Modern networks have dozens or hundreds, which is why the field became known as deep learning once backpropagation made deep networks trainable.

Sources

McCulloch, W.S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133. doi.org/10.1007/BF02478259
Hebb, D.O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. doi.org/10.1037/h0042519
Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
Lighthill, J. (1973). Artificial Intelligence: A General Survey. UK Science Research Council.
Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. doi.org/10.1073/pnas.79.8.2554
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. doi.org/10.1038/323533a0
Hubel, D.H., & Wiesel, T.N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106–154. doi.org/10.1113/jphysiol.1962.sp006837
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. doi.org/10.1109/5.726791
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. doi.org/10.1162/neco.1997.9.8.1735
Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629–1636.
Davies, M. et al. (2018). Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82–99.
The Royal Swedish Academy of Sciences (2024). The Nobel Prize in Physics 2024 — Press Release. nobelprize.org
Vaswani, A. et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30. arxiv.org/abs/1706.03762

Share on Facebook

Post on X

Save

Discover more from Web News For Us

Subscribe to get the latest posts sent to your email.

Neural Networks Explained: From the Human Brain to the Architecture Behind Modern AI

What a Biological Neuron Actually Does

1943: The First Artificial Neuron, and the Strange Life of Walter Pitts

1949: Cells That Fire Together, Wire Together

1958: The Perceptron and the First AI Winter

1982: A Physicist Brings Energy Landscapes to Neuroscience