1.2.1 Primary structure
The primary structure is simply the amino acid sequence. Example:
Met-Ala-Leu-Gly-Lys...
This sequence contains all the information needed to fold the protein.
Proteins do most of the work in a cell. They build structures, catalyse reactions, transmit signals, and control gene expression. To do these jobs, proteins must adopt precise three-dimensional shapes.
A protein’s structure determines its function. Change the shape, and the function often changes too.
This simple idea drives a large area of research: predicting protein structure from its sequence.
In these materials we will explore how modern computational methods predict protein structures and complexes. We will also examine how researchers analyse and interpret these models.
Before we begin, we need a few key ideas.
Proteins are chains of amino acids. The order of those amino acids forms the primary structure.
Cells build this chain on the ribosome. At first it looks like a loose thread, but the chain does not stay that way for long, as it eventually folds.
Folding happens because amino acids interact with each other and with water: some attract, some repel, some form bonds. These interactions pull the chain into a stable shape.
We usually describe protein structure at four levels.
The primary structure is simply the amino acid sequence. Example:
Met-Ala-Leu-Gly-Lys...
This sequence contains all the information needed to fold the protein.
Short stretches of the chain fold into regular patterns called secondary structures.
The two most common are:
These elements form the scaffolding of many proteins.
The tertiary structure describes the full three-dimensional fold of a single protein chain.
At this level we see:
This is the level most often shown in molecular graphics programs such as ChimeraX.
Many proteins work as complexes of several chains. The arrangement of these chains forms the quaternary structure.
Examples include:
In these materials we will often examine these assemblies. They reveal how proteins cooperate to perform biological functions.
Biologists can determine protein structures experimentally. The most common methods include:
These techniques produce detailed structures, but they require specialised equipment and time. As a result, we know the sequences of hundreds of millions of proteins, but we have solved structures for only a small fraction of them.
This gap creates a clear problem: We know the letters of the sequence, but we do not know the shape.
Structure prediction aims to close that gap. Accurate structure models help researchers:
In short, structure helps us turn sequence data into biological insight.
Predicting protein structure from sequence has challenged scientists for decades. The core problem seems simple:
Given an amino acid sequence, predict the structure it will fold into.
In practice, the problem is extremely hard. A protein chain can adopt an enormous number of possible shapes. Each shape corresponds to a different arrangement of atoms, however, only a few of these shapes are stable.
Predicting the correct fold means finding the lowest-energy structure among a vast number of possibilities.
Early methods relied on two main ideas.
Some approaches attempted to simulate the physics of folding. They calculated forces between atoms and searched for the lowest-energy structure.
In theory this approach should work. In practice it demands enormous computing power. Even today, simulating full protein folding remains difficult.
Another approach used homology.
Proteins that share similar sequences often share similar structures. If researchers already solved the structure of a related protein, they could model the new one by comparison.
This method works well when a close structural relative exists. But it struggles with proteins that lack known homologues.
To measure progress, the community created a regular blind test called CASP (Critical Assessment of Structure Prediction). In CASP:
CASP provides an objective way to measure how well prediction methods perform. For many years, progress was steady but slow.
Then the field changed.
In recent years, machine learning methods have improved many areas of science. Protein structure prediction is one example.
Deep learning models analyse large datasets of known protein structures and sequences. From these data they learn patterns that relate sequence to structure.
One system in particular drew wide attention: AlphaFold, developed by DeepMind.
AlphaFold uses deep neural networks to predict:
It combines these predictions into a final three-dimensional model.
In CASP14 (2020), AlphaFold achieved a level of accuracy close to many experimental structures. This result marked a major advance in the field.
Since then, researchers have applied AlphaFold and related tools to predict structures across entire proteomes. Large public resources now exist, including the AlphaFold Protein Structure Database, which contains predicted structures for millions of proteins.
Today, structure prediction plays a central role in molecular biology. Researchers routinely use predicted models to:
Tools continue to improve. New systems can predict:
These capabilities allow us to model increasingly realistic biological systems.
However, challenges still remain, as prediction methods still struggle with:
Researchers must also learn how to interpret prediction confidence and recognise when a model may be unreliable.
These materials focus on practical analysis of predicted protein structures.
You will learn how to: