News_

Over 160,000 new virus species discovered by AI

10 October 2024
Largest discovery of new virus species sheds light on hidden virosphere
Artificial intelligence (AI) has been used to reveal details of a diverse and fundamental branch of life living right under our feet and in every corner of the globe.

161,979 new species of RNA virus have been discovered using a machine learning tool that researchers believe will vastly improve the mapping of life on Earth and could aid in the identification of many millions more viruses yet to be characterised.

Published in Cell and conducted by an international team of researchers, the study is the largest virus species discovery paper ever published. 

“We have been offered a window into an otherwise hidden part of life on earth, revealing remarkable biodiversity,” said senior author Professor Edward Holmes from the School of Medical Sciences in the Faculty of Medicine and Health at the University of Sydney.

"This is the largest number of new virus species discovered in a single study, massively expanding our knowledge of the viruses that live among us,” Professor Holmes said. “To find this many new viruses in one fell swoop is mind-blowing, and it just scratches the surface, opening up a world of discovery. There are millions more to be discovered, and we can apply this same approach to identifying bacteria and parasites.” 

Although RNA viruses are commonly associated with human disease, they are also found in extreme environments around the world and may even play key roles in global ecosystems. In this study they were found living in the atmosphere, hot springs and hydrothermal vents. 

“That extreme environments carry so many types of viruses is just another example of their phenomenal diversity and tenacity to live in the harshest settings, potentially giving us clues on how viruses and other elemental life-forms came to be,” Professor Holmes said. 

How the AI tool worked

The researchers built a deep learning algorithm, LucaProt, to compute vast troves of genetic sequence data, including lengthy virus genomes of up to 47,250 nucleotides and genomically complex information to discover more than 160,000 viruses.

“The vast majority of these viruses had been sequenced already and were on public databases, but they were so divergent that no one knew what they were,” Professor Holmes said. “They comprised what is often referred to as sequence ‘dark matter’. Our AI method was able to organise and categorise all this disparate information, shedding light on the meaning of this dark matter for the first time.

The AI tool was trained to compute the dark matter and identify viruses based on sequences and the secondary structures of the protein that all RNA viruses use for replication.

It was able to significantly fast track virus discovery, which, if using traditional methods, would be time intensive.

Co-author from Sun Yat-sen University, the study’s institutional lead, Professor Mang Shi said: "We used to rely on tedious bioinformatics pipelines for virus discovery, which limited the diversity we could explore. Now, we have a much more effective AI-based model that offers exceptional sensitivity and specificity, and at the same time allows us to delve much deeper into viral diversity. We plan to apply this model across various applications.”

Co-author Dr Zhao-Rong Li, who researches in the Apsara Lab of Alibaba Cloud Intelligence, said: “LucaProt represents a significant integration of cutting-edge AI technology and virology, demonstrating that AI can effectively accomplish tasks in biological exploration. This integration provides valuable insights and encouragement for further decoding of biological sequences and the deconstruction of biological systems from a new perspective. We will also continue our research in the field of AI for virology.”

Professor Holmes said: “The obvious next step is to train our method to find even more of this amazing diversity, and who knows what extra surprises are in store.”

DECLARATION

The researchers declare no competing interests. The research was supported by the National Natural Science Foundation of China, the Shenzhen Science and Technology Program, the Natural Science Foundation of Guangdong Province, the Guangdong Province “Pearl River Talent Plan” Innovation and Entrepreneurship Team Project, the Hong Kong Innovation and Technology Fund (ITF) and the Health and Medical Research Fund. Professor Holmes is funded by a National Health and Medical Research Council of Australia Investigator grant and by AIR@InnoHK administered by the Innovation and Technology Commission, Hong Kong Special Administrative Region, China.

Luisa Low

Senior Media and PR Adviser

Related news