Phylogenetic trees are like the family trees of all life on Earth. They are an attempt to tie together living things, showing how closely related they are by branching from a common ancestor as species evolve in different directions and become distinct.
But close common ancestry isn’t the only way that species can come to share similar genes.
Diverse communities of living things thrive in extreme environments that don’t seem particularly hospitable. Places like volcanoes, deep-sea trenches, and polar regions challenge life with extreme temperature, radiation, pressure, salt concentration, acidity, and more.
The organisms that thrive here are called extremophiles, and these environments leave an imprint on their genomes as they adapt to common challenges.
The study that uncovered this unexpected link was led by Kathleen Hill, associate professor of biology at Western University, and Lila Kari, professor of computer science at the University of Waterloo. Their paper was published in Scientific Reports.
Bacteria, archaea, and eukaryotes are as distantly related as they come. They represent the very first branches at the base of the Tree of Life, and likely branched off billions of years ago. To give a sense of how broad domains are, all animals, plants, algae, and fungi fall under eukaryotes.
Bacteria and archaea are both types of microbial life, but being classified under separate domains means they are taxonomically further removed from each other than a giraffe and a mushroom.
The team sifted through the Genome Taxonomy Database to identify nearly 700 microbes that live in extreme temperature or acidity and have high-quality genomic data available.
They stratified the temperature dataset into four levels (two categories each for very hot or very cold temperature ranges), and the pH dataset into either highly acidic or alkaline.
Next, they used machine learning to look for patterns in genomic signatures. They did this with domain labels (bacteria vs archaea), and they also separately trained an algorithm based on environment labels. This training happened over nine rounds, saving a tenth packet of data to see whether the trained algorithm could then accurately sort unlabeled genomes correctly.
While the domain classifications were highly accurate, the environment classifications were still predicted with medium to medium-high accuracy. That means that regardless of how distantly related the species were, there was a strong likelihood that the algorithm could figure out the right environment category to place it in based on its genes alone.
The study also tried a different approach, feeding in data with no labels at all and asking an algorithm simply to cluster together genomes based on similarity. Starting from a blank slate and free from the influence of any human-applied labels, the resulting algorithm helped uncover ‘exemplars’ of each environment that were chosen as pairs despite being from different domains.
“This discovery flies in the face of conventional thinking that pervasive, genome-wide, genomic signatures carry only information about naming, describing and classification of organisms,” said Hill in a press release.
“DNA should be mostly about inheritance, biological relatedness, and common ancestry, not about the place you live in, but we see something completely different with extremophiles,” added Kari.
This analysis helps us understand life in high-stress conditions and the adaptations they gain for resilience and survival. From applications like improving bioremediation of contaminated sites to looking for life in space, these insights illuminate the unique characteristics of life at the extremes.