For most of the twentieth century, geneticists focused almost exclusively on the two percent of the human genome that codes for proteins — the sequences that carry instructions for building the molecular machines of life. The remaining 98 percent was labelled “junk DNA” and largely set aside. It seemed inert, non-functional, an evolutionary residue accumulating in the genome like clutter in an attic.
That picture has been overturned completely. We now know that non-coding DNA — increasingly called “dark DNA” in recognition of how little we understand it — is not junk at all. It is a vast regulatory landscape that controls when, where, and how much each gene is expressed. It contains the switches and dials that determine whether a stem cell becomes a neuron or a liver cell, whether an immune cell mounts a response or stands down, whether a cancer-suppressing gene is active or silenced. Most of the genetic variants associated with common diseases — cancer, diabetes, heart disease, schizophrenia — lie not in protein-coding genes but in this non-coding regulatory territory.
The problem is that this regulatory landscape is extraordinarily complex, poorly understood, and until recently almost impossible to decode systematically. A single regulatory variant — a single letter change in a stretch of non-coding DNA — can alter the expression of a gene in one specific cell type at one specific developmental stage while leaving everything else unaffected. Predicting these effects from sequence alone has been one of the hardest problems in genomics.
In 2025, Google DeepMind released AlphaGenome — an artificial intelligence model designed specifically to solve this problem. Built on the same foundations that produced AlphaFold, the protein structure predictor that transformed structural biology and earned the 2024 Nobel Prize in Chemistry, AlphaGenome represents a qualitative leap in our ability to decode the regulatory genome. Its implications reach across medicine, drug discovery, evolutionary biology, and our fundamental understanding of how life is controlled at the molecular level.
What AlphaGenome Does
AlphaGenome is a deep learning model that takes a DNA sequence as input and predicts its regulatory output — specifically, how that sequence influences gene expression across different cell types and tissues. It can analyse sequences up to one million DNA letters long, capturing regulatory interactions that operate over vast genomic distances and that previous models could not detect.
The model was trained on enormous datasets of genomic measurements — chromatin accessibility profiles, histone modification patterns, transcription factor binding data, and gene expression measurements across hundreds of human cell types and tissues. By learning the statistical relationships between DNA sequences and their regulatory outputs across this wealth of experimental data, AlphaGenome developed the ability to predict regulatory effects of sequences it has never seen before.
What makes this practically powerful is its ability to perform what geneticists call variant effect prediction. Given a DNA sequence with a single letter change — a single nucleotide variant — AlphaGenome can predict how that change alters gene regulation in specific cell types. This is the core problem in interpreting human genetic variation. Genome-wide association studies have identified thousands of variants associated with diseases, but for the vast majority of these variants — lying in non-coding regions — we do not know which gene they affect, in which cell type, or through what mechanism. AlphaGenome provides a computational route to answering these questions at scale.
Building on AlphaFold: The Deepmind Genomics Programme
To understand AlphaGenome’s significance, it helps to understand what AlphaFold achieved and why DeepMind moved from protein structure to gene regulation as its next challenge.
AlphaFold, released in 2020 and significantly expanded in 2022, solved the protein folding problem — predicting the three-dimensional structure of a protein from its amino acid sequence with accuracy matching experimental methods. The impact on structural biology was immediate and transformative. Researchers who had spent years trying to crystallise proteins and determine their structures found they could now get accurate predictions in minutes. The AlphaFold Protein Structure Database now contains predicted structures for virtually every protein in the human proteome and hundreds of millions of proteins from other organisms, freely accessible to researchers worldwide.
But protein structure is only part of biology. Knowing the structure of a protein tells you what it looks like and how it might function. It does not tell you when it is made, in which cells, in what quantities, or how its production is controlled. That is the domain of gene regulation — and AlphaGenome is designed to do for regulatory genomics what AlphaFold did for structural biology.
The two systems are complementary. AlphaFold tells you what a protein does. AlphaGenome tells you when and where it is made. Together, they provide a more complete picture of molecular biology than either could alone — and the combination is already generating new hypotheses about disease mechanisms that would have been invisible with either system individually.
Non-Coding DNA and Disease
The medical importance of non-coding DNA regulation cannot be overstated. Genome-wide association studies — large-scale analyses comparing the genomes of people with and without specific diseases — have identified more than 100,000 genetic variants associated with human traits and diseases. Approximately 90 percent of these variants lie in non-coding regions of the genome.
This has created a profound interpretive gap. We know that these variants are statistically associated with disease. We do not know what most of them do at a molecular level. Without understanding the mechanism — which gene is regulated, in which cell type, through what pathway — it is impossible to develop targeted therapies based on these findings. The variants are clues without explanations.
AlphaGenome addresses this gap directly. By predicting the regulatory effects of specific variants in specific cell types, it provides a mechanistic framework for interpreting the findings of genome-wide association studies — turning statistical associations into biological hypotheses that can be tested experimentally. For cancer variants, it can predict which cell types are affected. For psychiatric disease variants, it can identify which brain cell types show altered gene regulation. For cardiovascular disease variants, it can pinpoint the relevant vascular cell populations.
This does not mean AlphaGenome’s predictions are always correct. Like any predictive model, it makes errors, and its predictions need experimental validation before they can be translated into clinical applications. But it dramatically accelerates the process of moving from genetic association to biological understanding — compressing work that might previously have taken years of laboratory experiments into computational analyses that take hours.
AlphaGenome and Drug Discovery
One of the most immediate applications of AlphaGenome is in drug target identification and validation. Most drugs work by modulating the activity of a protein — typically an enzyme or receptor that plays a role in disease. Identifying which proteins to target, and predicting whether modulating them will have the desired therapeutic effect without harmful side effects, is one of the central challenges of drug discovery.
AlphaGenome contributes to this process by identifying which regulatory variants drive disease — and therefore which genes and proteins are causally involved in disease rather than merely correlated with it. This distinction matters enormously. A protein whose expression is causally altered by a disease-associated variant is a much more validated drug target than one identified through less direct evidence.
Pharmaceutical companies are already integrating AlphaGenome-style regulatory predictions into their target identification pipelines. The tool is particularly valuable for complex multifactorial diseases — the major killers including cancer, cardiovascular disease, neurodegeneration, and metabolic disease — where the disease mechanism involves regulatory changes across multiple genes in multiple cell types, and where no single obvious drug target exists.
Personalised Medicine and the Regulatory Genome

Beyond population-level drug discovery, AlphaGenome opens the door to truly personalised medicine based on an individual’s regulatory genome. Two people with the same disease may have arrived there through different regulatory pathways — different variants affecting different genes in different cell types — and may therefore respond differently to the same treatment.
As whole-genome sequencing becomes cheaper and more clinically routine, AlphaGenome-style tools could be used to interpret a patient’s personal regulatory variant profile, identifying which specific disease mechanisms are active in their genome and which therapeutic interventions are most likely to be effective. This is an extension of the precision medicine vision that has already transformed oncology — where tumour genome sequencing guides treatment selection — to the much larger domain of common complex diseases.
For a broader look at how gene editing tools like CRISPR are changing medicine by directly modifying DNA sequences, see our article on gene editing in 2026. And for an explanation of how the environment shapes gene expression through mechanisms that operate above the DNA sequence itself, see our article on epigenetics. AlphaGenome, gene editing, and epigenetics together represent three converging approaches to understanding and controlling gene regulation — one computational, one molecular, one environmental.
Limitations and What Comes Next
AlphaGenome is a powerful tool, but it has significant current limitations that are important to understand clearly. Its predictions are probabilistic — it produces the most likely regulatory effect based on patterns in its training data, but it can be wrong, particularly for novel variant combinations or cell types underrepresented in its training data.
The model also operates at the level of DNA sequence and cannot currently incorporate three-dimensional chromatin structure, the spatial organisation of chromosomes in the nucleus that plays an important role in long-range regulatory interactions. Nor can it model the dynamic regulatory changes that occur during development, ageing, or disease progression over time.
DeepMind has made AlphaGenome freely available to the research community, following the model established by AlphaFold. The tool is already being used by hundreds of research groups, and the feedback from these applications will drive further development. The next generation of regulatory AI models will likely incorporate three-dimensional genomic structure, single-cell data at much higher resolution, and multi-species comparative genomics to improve prediction accuracy further.
The broader trajectory is clear. The dark DNA is being illuminated. The regulatory genome — for decades the least understood part of biology — is becoming increasingly legible. And as it does, the consequences for medicine, evolutionary biology, and our fundamental understanding of how life works are difficult to overstate.
Frequently Asked Questions
What is non-coding DNA?
Non-coding DNA refers to the approximately 98% of the human genome that does not carry instructions for making proteins. Once dismissed as “junk DNA,” non-coding DNA is now understood to contain the regulatory sequences that control when, where, and how much each gene is expressed. Most disease-associated genetic variants lie in non-coding regions.
What is AlphaGenome?
AlphaGenome is an artificial intelligence model developed by Google DeepMind that predicts how DNA sequences — particularly non-coding regulatory regions — influence gene expression across different cell types. It can analyse sequences up to one million DNA letters long and predict the effects of specific genetic variants on gene regulation.
How does AlphaGenome relate to AlphaFold?
AlphaFold predicts the three-dimensional structure of proteins from their amino acid sequences. AlphaGenome predicts how DNA sequences regulate gene expression. The two systems are complementary: AlphaFold tells you what a protein looks like and how it might function; AlphaGenome tells you when and where it is made and how its production is regulated.
What diseases could AlphaGenome help address?
AlphaGenome is most immediately applicable to diseases where the causal genetic variants lie in non-coding regulatory regions — which includes most common complex diseases. This encompasses cancer, cardiovascular disease, type 2 diabetes, neurological and psychiatric disorders, and autoimmune conditions.
Is AlphaGenome freely available?
Yes. Google DeepMind released AlphaGenome to the research community following the same open-access model as AlphaFold. Researchers can access the model and use it for non-commercial research purposes.
What are the limitations of AlphaGenome?
AlphaGenome’s predictions are probabilistic and can be wrong, particularly for novel variants or underrepresented cell types. It currently cannot model three-dimensional chromatin structure or dynamic regulatory changes over time. Its predictions require experimental validation before clinical application.
Further Reading
- Google DeepMind — AlphaGenome Research
- Nature — AlphaGenome Publication
- Wikipedia — Non-Coding DNA
- Wikipedia — AlphaFold
- The Code Breaker by Walter Isaacson — an accessible account of the CRISPR revolution and modern genetics
Sources
- Google DeepMind — AlphaGenome
- Wikipedia — Non-Coding DNA
- Wikipedia — Genome-Wide Association Studies
- Wikipedia — AlphaFold
- Web News For Us — Gene Editing in 2026
- Web News For Us — Epigenetics
- Web News For Us — The Human Microbiome
About the Author
Baryon is the founder and editor of Web News For Us. Driven by a deep fascination with the biggest unanswered questions in science — from quantum physics and cosmology to the nature of consciousness and the genetic code written into every living cell — he has spent years studying modern physics, biology, and the history of scientific thought. He covers Science & AI, Space, Genetics & Research, and the timeless wisdom of history’s greatest thinkers and mystics.
If you have ever looked at the night sky and felt that pull to understand what is out there — or the wonder of an entire universe coiled inside your genes — you are in the right place.
Discover more from Web News For Us
Subscribe to get the latest posts sent to your email.
