The Canadian BioGenome Project is an ambitious undertaking that aims to map the complete genome of plants and animals
By Sean Tarry
When it comes to scientific pursuits, there aren’t many that match the ambition, breadth or significance of the Canadian BioGenome Project. Established as part of the larger Earth BioGenome Project, it’s an undertaking that aims to map the genomes of all plants, animals, fungi and other microbial life — an effort to, in effect, catalogue the complete genetic diversity of Canada’s plants and animals through genomic sequencing.
The sheer magnitude of the project is not lost on Steven Jones, Co-Director of Vancouver’s Genome Sciences Centre and Co-Lead Investigator for the Canada BioGenome Project. However, he says equal to the amount of work that will be required to complete this endeavour is its importance to our ability to further our collective understanding of life on Earth.
“What we’ve just started in our attempt to sequence the genomes of Canada’s plants and animals presents an enormous challenge, which will require a great deal of work,” he says. “It will also require a tremendous amount of collaboration between individuals within the science community in order for us to achieve our objective. It’s a massive project that poses huge potential benefits in the way of enhancing our understanding of the evolution of life and uncovering fundamental genetic principles of health and disease that will help populations everywhere.”
Beginning as a consortium of sequencing laboratories across the country, which includes the Genome Sciences Centre in Vancouver, Toronto’s Hospital for Sick Children, and McGill University in Montréal, the project has grown to rely on a host of experts whose insights have been sought in order to guide and direct the research and work being conducted. There are just over an estimated 80,000 species of plants and animals that are known to inhabit Canada from coast to coast to coast. Jones and the Genome Project are starting with the sequencing of between 400 and 800 species. Experts, Jones says, are helping to narrow the project’s focus.
“We’ve got a group of a few hundred or so scientists located right across the country who are all experts concerning very specific types of animals and plants, like conifers or crustaceans,” he says.
“They’re helping us to identify the species that are high priority for sequencing. We can’t possibly sequence every species of plant and animal across the entire country right away. So, we’ve got to choose wisely to start. It really comes down to understanding and being aware of the parts of the tree of life that are underexplored, identifying the plants and animals for which no genetic understanding exists and prioritizing them as such.”
Jones goes on to explain that in addition to filling out the tree of life, there are also other factors considered when prioritizing the sequencing of species, including scientifically interesting plants or animals that may interact with humans and possess unique properties, as well as species that have become endangered as a result of climate change.
In addition, he suggests that it may also be just as important to sequence and study the prey of the endangered species to further understand why the numbers of some predators are diminishing. In fact, there is a whole web of criteria that inform decisions made by scientists feeding into the Canada BioGenome Project. Supporting much of the human effort, Jones explains, is cutting-edge technology that is serving to elevate and improve the work being conducted.
“The major type of technology that we use is obviously related to DNA sequencing,” he says. “It’s recently become a much more critical and useful tool. Its potency and power have increased, and the prices have come down significantly since the first human genome was sequenced at the turn of the century. It is rapidly improved technology that is now much more accessible for this type of work, allowing us to do more, quicker, faster, cheaper and more efficiently.”
The power of artificial intelligence
It is estimated that the sequencing of the first human genome took in excess of 15 years and cost somewhere in the range of US$2.7 billion. Today, Jones and his team of scientists can routinely sequence genomes in a matter of days for a little more than CDN$1,000. And, in addition to the vastly improved technology, advancements related to artificial intelligence are going a long way toward supporting efficiencies and breakthroughs within the study and research of genomics and the sequencing of species.
In particular, Jones points to the recent introduction of AlphaFold, an artificial intelligence program developed by Google which, informed by deep learning, presents the ability to perform predictions concerning protein structure, among other things.
“One of the greatest advancements over the course of the past 24 months or so is the AI that’s coming out of Google’s AlphaFold project,” he asserts.
“If we can sequence the genes, Google’s AI will then predict the three-dimensional structure of the protein that the gene creates. It represents a huge step forward, allowing us to, for instance, sequence the genome of a bee, and with a couple weeks of computation will be able to produce the three-dimensional structure of every protein in that bee. It helps us better understand things like how the bee is interfacing with its environment and perhaps design a new insecticide that doesn’t impact the bee population. It’s quite incredible that we are now, through sequencing, able to generate a parts list of each plant and animal, and through Google’s technology can understand the precise shape of each part. It’s going to continue to be a key technology that we use going forward.”
Open data ecosystem
Google’s technology will also be key in helping Jones and his team of scientists fulfill one of the mandates of the Earth BioGenome Project, which is to freely share with the public all of the data and information that’s generated through sequencing. To do this, data is submitted to collaborators in Europe who annotate it in a way that’s consistent with all of the other genomes being created around the world as part of the project. It is an approach that allows everyone involved to operate within an open data ecosystem, enabling its full access and use to anyone on the planet, posing a multitude of potential future benefits. It is one of the core concepts of the project, says Jones, and one of its aspects that makes it such an exciting one to work on.
“It really is quite an incredible project to be a part of. And only a few months in, our work is only now beginning in earnest. The initial phase of the project is scheduled to be a duration of four years. We’re hoping that in that time we’re able to, through our work, show significant progress in our sequencing efforts. In addition, over this time, we’re anticipating further technological advances that will make the use of these tools even more accessible and financially viable, allowing us to continue deepening our understanding of the vast Canadian ecosystem.”