Understanding the human genome in health and disease takes a multidisciplinary team. There are three billion pairs of chemical letters that spell out each person’s DNA, and decoding those messages requires geneticists to sequence each genome about 50 times, says geneticist Stephen Scherer. Processing those data takes a lot of computing power.
Scherer is a professor of molecular genetics at the University of Toronto, and also Director of the Human Genome Centre at SickKids Hospital. His research group looks at the genetic basis of autism.
“When we did the first 5,200 genomes in our autism project here, on this floor, at the time in about 2015 it was the largest data transfer that Google had received in its history,” says Scherer. “We’ve now doubled that and we’re doing roughly 10,000 genomes a year here.”
Of those tens of thousands of data sets, about 11,500 are whole genome sequences of people with autism and their families, says Brett Trost, a computational biologist in Scherer’s lab. Considerable computing power and storage are needed to process them.
Sequencing and imaging can help sort out the order and identity of nucleotides — the chemical letters that spell each genome — but only for short stretches of DNA. Computation is required to string together the entire sequence of three billion pairs. That kind of work can be done at the high performance computer facility at SickKids, but comparing thousands of genomes requires the power of the cloud.
“Computer science is becoming more and more important for being able to make meaningful inferences from biological data,” says Trost.
“The human genome is so complex that you can’t really just look at simple patterns in the data and make sense of it. But with machine learning methods, we can actually give the computer lots and lots of information, that one human can’t necessarily make sense of himself or herself, but the computer can learn the patterns and make sense of it.
“We can use these methods to try to predict, for example, whether genes might be associated with disorders like autism.”
That kind of insight requires a wide variety of expertise, which is why computational biologists like Trost are key members of Scherer’s team.
“Our team here is founded around human geneticists, but in fact 30 percent of our staff are computational biologists,” says Scherer. “Multidisciplinary teams in this realm of big science are critical.”