Leading the edge in innovation, the Department of Energy's Oak Ridge National Laboratory (ORNL) has made significant stride—its supercomputer Frontier has been successfully used to train an artificial intelligence model to design new proteins. These proteins could have wide-ranging applications, potentially revolutionizing areas such as medicine and environmental cleanup. ORNL's breakthrough research leverages the formidable computational power of the world’s fastest supercomputer to streamline a process that aims to create the very foundations of life.
The goal here is nothing short of ambitious, seeking to rapidly and reliably design new proteins that could lead to future medical and environmental solutions. "Think about a ChatGPT that designs proteins," Arvind Ramanathan, a computational biologist at Argonne National Laboratory and the study’s senior author, told ORNL. Ramanathan's efforts, along with his cross-institutional team, have earned them a finalist nomination for the prestigious Gordon Bell Prize, with the winner to be announced at this year’s International Conference for High Performance Computing.
The AI in question, a large language model (LLM), was exposed to an extensive array of data on protein sequences, their structures, and functions. By harnessing the computational might of not only ORNL's Frontier but also other supercomputers such as Argonne's Aurora and NVIDIA's PDX, researchers broke through the exascale barrier—hitting a peak speed of around 5.57 exaflops. The high-speed digital workflow created for this purpose was instrumental in enabling the LLM to learn how to better predict biochemical functions based on protein structures.
Testing the model’s capabilities involved designing a protein sequence with a lower activation threshold than malate dehydrogenase, a well-known enzyme in metabolism. The AI's generative capabilities have to be tempered with the reality of errors and the model was hence trained to avoid "hallucinations"—errors where a model fills in gaps in knowledge with fabricated information. "We're using methods similar to the natural language processing that allows ChatGPT to form or finish sentences," Ramanathan explained to ORNL, "but this is for protein sequences."
The implications of this research are profound. Success in the lab tests has shown that the AI model can identify and replicate the requirements for efficient protein sequences. This opens up the possibility of designing not just any proteins but those that could play a critical role in developing new vaccines, antibodies, and treatments for diseases such as cancer. These achievements sit at the intersection of computational power and biological science, indicating we might be closer than ever to a future where medical breakthroughs are aided significantly by AI and supercomputing.
This groundbreaking work is supported by the National Institutes of Health and the DOE Office of Science’s Advanced Scientific Computing Research program. As the research pushes forward, ORNL's team, including scientists from other prestigious institutions like NVIDIA, UC Berkeley, and the California Institute of Technology, plan to scale their model for the design of more complex proteins. The process underscores a fascinating blend of high-performance computing and meticulous biological research, setting a new benchmark for interdisciplinary collaboration in science and technology.