Lifting Our Genomic Knowledge Into the Cloud

It takes a solid team to understand the human genome: not just geneticists and biologists, but the pattern-spotting power of machines.

 |  Transcript [PDF]

Understanding the human genome in health and disease takes a multidisciplinary team. There are three billion pairs of chemical letters that spell out each person’s DNA, and decoding those messages requires geneticists to sequence each genome about 50 times, says geneticist Stephen Scherer. Processing those data takes a lot of computing power.

Scherer is a professor of molecular genetics at the University of Toronto, and also Director of the Human Genome Centre at SickKids Hospital. His research group looks at the genetic basis of autism.

“When we did the first 5,200 genomes in our autism project here, on this floor, at the time in about 2015 it was the largest data transfer that Google had received in its history,” says Scherer. “We’ve now doubled that and we’re doing roughly 10,000 genomes a year here.”

Of those tens of thousands of data sets, about 11,500 are whole genome sequences of people with autism and their families, says Brett Trost, a computational biologist in Scherer’s lab. Considerable computing power and storage are needed to process them.

Sequencing and imaging can help sort out the order and identity of nucleotides — the chemical letters that spell each genome — but only for short stretches of DNA. Computation is required to string together the entire sequence of three billion pairs. That kind of work can be done at the high performance computer facility at SickKids, but comparing thousands of genomes requires the power of the cloud.

“Computer science is becoming more and more important for being able to make meaningful inferences from biological data,” says Trost.

“The human genome is so complex that you can’t really just look at simple patterns in the data and make sense of it. But with machine learning methods, we can actually give the computer lots and lots of information, that one human can’t necessarily make sense of himself or herself, but the computer can learn the patterns and make sense of it.

“We can use these methods to try to predict, for example, whether genes might be associated with disorders like autism.”

That kind of insight requires a wide variety of expertise, which is why computational biologists like Trost are key members of Scherer’s team.

“Our team here is founded around human geneticists, but in fact 30 percent of our staff are computational biologists,” says Scherer. “Multidisciplinary teams in this realm of big science are critical.”

‹ Previous post
Next post ›

Stephen Scherer holds the GlaxoSmithKline-Canadian Institutes of Health Research Endowed Chair in Genome Sciences at The Hospital for Sick Children (SickKids) and University of Toronto (U of T) and he is Director of the U of T McLaughlin Centre, as well as The Centre for Applied Genomics at SickKids.

His team contributed to the landmark discovery of global gene copy number variation (CNV) as a common form of genetic variation in human DNA. His group then identified CNV to contribute to the aetiology of autism and many other disorders, and the Database of Genomic Variants he founded facilitates hundreds of thousands of clinical diagnoses each year.

His research is documented in over 500 peer-reviewed publications and he is one of the most highly-cited scientists in the world. Dr. Scherer has won numerous honours such as the Steacie Prize, a Howard Hughes Medical Institute Scholarship, the Premier’s Summit Award for Medical Research, the Killam Prize, and three Honorary degrees. He is a distinguished Fellow of the Canadian Institute for Advanced Research, the American Association for the Advancement of Science, and the Royal Society of Canada.

Research2Reality is a groundbreaking initiative that shines a spotlight on world-class scientists engaged in innovative and leading edge research in Canada. Our video series is continually updated to celebrate the success of researchers who are establishing the new frontiers of science and to share the impact of their discoveries with the public.