In the late 19th century, “Harvard computer” Henrietta Swan Leavitt made a discovery that revolutionized astronomy. By comparing the brightness of pulsating stars known as Cepheid variables to their pulsation period, Leavitt found a way to determine the distances to stars and galaxies much farther than any that had been measured before. Her famous Leavitt Law is still used by astronomers today.
How did she do it? By analyzing thousands of images of the night sky, all by eye.
Today, sophisticated new telescopes have both the scope and the power to observe hundreds of times more astronomical sources than ever before. So many, in fact, that analyzing each one manually — as Leavitt once did — simply isn’t possible.
Instead, astronomers are increasingly turning to machine learning, using computers to rapidly evaluate the huge datasets that modern wide-field telescopes can now collect. This approach uses algorithms to train computers to “learn” about datasets without giving them explicit rules or instructions. Over time, as the computers sift through more images, they can start to pick out patterns that even a trained human eye could miss.
Photographing the night sky
The LSST, or Large Synoptic Survey Telescope, is one such wide-field telescope that will produce deeper, wider images of the Universe than astronomers have ever seen before. The project is a collaboration between scientists across the world, including assistant professor Renée Hložek from the Dunlap Institute for Astronomy & Astrophysics at the University of Toronto, who is one of the project’s Principal Investigators. Astronomers from York University, the University of Waterloo, Western University, and the University of British Columbia are also involved.
Using the world’s largest digital camera, the telescope will spend 10 years continually taking photos of the night sky in the search for astronomical transients. These are events that last much shorter than the millions to billions of years in which stars, galaxies, and the Universe evolve.
“We’re going to detect millions of transients,” says Hložek, “and that just blows my mind, because we’ve never had such a wide view of the transient sky.”
The Cepheid variable stars Leavitt once analyzed by eye are just one example of the many astronomical transients in the Universe.
But sorting through that much data is a daunting task. Inspired by a data-driven classification challenge she’d participated in as a graduate student, Hložek and her colleagues decided to turn their overwhelming amount of data into an open-source challenge to make sense of the Universe.
This is how the Photometric LSST Astronomical Time-Series Classification Challenge — or PLAsTiCC — was born. The challenge was hosted on Kaggle, an online community of data scientists, and its purpose was to classify simulated astronomical transients similar to those that will be detected by the LSST.
The goal is to prepare astronomers for the massive amounts of data they’ll be dealing with when the telescope begins operations in 2019. Each night, the LSST will produce 20 terabytes of data: the equivalent of 200,000 MP3 songs. To be as efficient as possible, LSST astronomers want to have tools ready that will allow them to automatically sort through the data and pick out interesting new objects in the sky.
Leaving the sorting up to a computer means more time for astronomers to get their hands on the actual science.
Teaching a computer to classify the Universe
While machine learning has long been popular in the data science community, its application to astronomy is relatively new. That’s part of the reason why Hložek felt that Kaggle was the ideal host for the challenge.
The data scientists who participate in Kaggle challenges may not be trained in astronomy, Hložek explains, but they do have valuable expertise in machine learning algorithms. Hložek hopes that by opening the challenge up to non-astronomers, she will be exposed to innovative methods that she and her astronomer colleagues never would have considered otherwise.
To make the challenge as realistic as possible, Hložek and her colleagues called on members of the astronomical community to provide models of astronomical transients that they expect to find with the LSST. While the challenge itself is described in a paper that was released on the astronomy arXiv, the details of the models were not shared until the challenge was complete.
Since the LSST will also discover many objects that astronomers have never encountered before, the team made sure to hide some of these models from the challenge participants. The computer algorithms created for the challenge were only exposed to these hidden models at the very end, when the algorithms were being judged.
Part of the challenge was therefore figuring out ways to train computers to classify objects they’ve never seen before.
The importance of interdisciplinary research
To Hložek, interdisciplinary challenges like PLAsTiCC are exactly what the astronomy community needs. Keeping the field open allows for the inclusion of expertise across a wide range of disciplines, and ultimately leads to better science.
“More and more people are asking the same questions,” Hložek says. “And that connection is great: it makes you think harder about your science.”