No. 3203: SHAPE OF PROTEIN

by Krešimir Josić

Click here for audio of Episode 3203

Today, let's talk about the shape of proteins. The University of Houston presents this program about the machines that make our civilization run, and the people whose ingenuity created them.

As we learn in elementary school, DNA is the blueprint of life. It guides our growth, our deterioration, and everything in between. Cells translate information encoded in DNA into proteins, and proteins provide structure to our bodies, and guide the chemical processes that are life.

The information encoded in DNA is translated into aminoacid chains. These chains are like self-assembling origami - once created, they fold themselves into the right shape. And each aminoacid chain becomes a properly functioning protein only after folding itself into the right structure. We understand well how DNA determines what an aminoacid chain is made of. But we know far less about how DNA determines the final shape of a protein.


Protein folding   Photo Credit: Wikipedia

Think of a long piece of pipe cleaner, say two feet long. Now imagine that at every inch between the tip and the end you are free to bend the cleaner at any angle, and in any direction you choose. The number of possible shapes you can get after all these bends is astronomical. Proteins are similar: There is such a large number of possible ways to fold an aminoacid chain that it is very difficult to predict the actual form that a protein assumes just from knowing what it is made of.

Yet protein shape is extremely important. When folded correctly, proteins can recognize hostile bacteria, generate the power in our cells, and make fertilizer out of the nitrogen in our air. Misfolded proteins lead to disease like Alzheimer's and cystic fibrosis.


Folding structures   Photo Credit: Wikipedia

The number of possible shapes for even small aminoacid chains is huge. We cannot explore all possibilities even with the most powerful computers. Yet biophysicists and mathematicians have made great progress over the years by showing how the folding process can be broken into smaller steps easier for computers to tackle. But the problem remains far from solved, and scientists pit their programs against each other in various competitions.


Possible folding pathways   Photo Credit: Wikipedia

Surprisingly, the same machine learning team that has created the computer program that defeated the world champion in the game Go, is also a recent protein folding competition winner. Machine learning works best when computers can be trained on many examples. Over the last eight decades, scientists have been able to use a variety of methods to determine the structure of many proteins. This information is now used to train machines to make predictions that exceed the accuracy of anything we have seen before.

Understanding how proteins fold gives us insights into how our bodies work, and why they fail to do so. It may also allow us to design novel proteins that have never appeared in nature before. These new structures could break down pollutants, allow for green chemical and fuel production, or help us heal. Yet it is only with the help of machines that we will finally crack this fundamental question of how life takes shape.

This is Krešo Josić at the University of Houston, where we are interested in the way inventive minds work.

This article by the DeepMind collaboration describes the original research: https://deepmind.com/blog/alphafold/.

The protein folding problem has different facets that I did not get a chance to explore here. Here is a good overview of the problem: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2443096/.

Here is a report on the competition won by DeepMind: https://venturebeat.com/2018/12/03/deepminds-alphafold-wins-casp13-protein-folding-competition/.

The protein folding problem, and protein design has also been crowdsourced. You can find out more about this at: https://fold.it/portal/.

Thanks to Prof. Joff Silberg in the Department of BioSciences at Rice Unviersity for a number of helpful suggestions.