From Surf Wiki (app.surf) — the open knowledge base

1000 Plant Genomes Project

Former international research effort

Summary

Former international research effort

Field	Value
name	1000 Plant Genomes Project
end	2019
start	2008
website
funding_agency

The 1000 Plant Transcriptomes Initiative (1KP) was an international research effort to establish the most detailed catalogue of genetic variation in plants. It was announced in 2008 and headed by Gane Ka-Shu Wong and Michael Deyholos of the University of Alberta. The project successfully sequenced the transcriptomes (expressed genes) of 1,000 different plant species by 2014; its final capstone products were published in 2019.

1KP was a large-scale (involving many organisms) sequencing projects designed to take advantage of the wider availability of high-throughput ("next-generation") DNA sequencing technologies. The similar 1000 Genomes Project, for example, obtained high-coverage genome sequences of 1,000 individual people between 2008 and 2015, to better understand human genetic variation. The initiative provided a template for further planetary-scale genome projects, including the 10KP Project—sequencing the whole genomes of 10,000 plants, and the Earth BioGenome Project—aiming to sequence, catalogue, and characterize the genomes of all of Earth's eukaryotic biodiversity.

Goals

, the number of classified green plant species was estimated to be around 370,000, however, there are probably many thousands more yet unclassified. Despite this number, very few of these species have detailed DNA sequence information to date; 125,426 species in GenBank, , but most (95%) having DNA sequence for only one or two genes. "...almost none of the roughly half million plant species known to humanity has been touched by genomics at any level". The 1000 Plant Genomes Project aimed to produce a roughly a 100x increase in the number of plant species with available broad genome sequence.

Evolutionary relationships

There have been efforts to determine the evolutionary relationships between the known plant species, but phylogenies (or phylogenetic trees) created solely using morphological data, cellular structures, single enzymes, or on only a few sequences (like rRNA) can be prone to error; morphological features are especially vulnerable when two species look physically similar though they are not closely related (as a result of convergent evolution for example) or homology, or when two species closely related look very different because, for example, they are able to change in response to their environment very well. These situations are very common in the plant kingdom. An alternative method for constructing evolutionary relationships is through changes in DNA sequence of many genes between the different species which is often more robust to problems of similar-appearing species. With the amount of genomic sequence produced by this project, many predicted evolutionary relationships could be better tested by sequence alignment to improve their certainty. With 383,679 nuclear gene family phylogenies and 2,306 gene age distributions with Ks plots used in the final analysis and shared in GigaDB alongside the capstone paper.

Biotechnology applications

The list of plant genomes sequenced in the project was not random; instead plants that produce valuable chemicals or other products (secondary metabolites in many cases) were focused on in the hopes that characterizing the involved genes will allow the underlying biosynthetic processes to be used or modified. If these plant mechanisms could be used to produce mass quantities of industrially useful oil, or modified such that they do, then they would be of great value. Here, knowing the sequence of the plant's genes involved in the metabolic pathway producing the oil is a large first step to allow such utilization. A recent example of how engineering natural biochemical pathways works is Golden rice which has involved genetically modifying its pathway, so that a precursor to vitamin A is produced in large quantities making the brown-colored rice a potential solution for vitamin A deficiency. This is concept of engineering plants to do "work" is popular and its potential would dramatically increase as a result of gene information on these 1000 plant species. Biosynthetic pathways could also be used for mass production of medicinal compounds using plants rather than manual organic chemical reactions as most are created currently.

One of the most unexpected results of the project was the discovery of multiple novel light-sensitive ion-channels used extensively for optogenetic control of neurons discovered through sequencing and physiological characterization of opsins from over 100 species of alga species by the project. The characterization of these novel channelrhodopsin sequences providing resources for protein engineers who would normally have no interest in or ability to generate sequence data from these many plant species. A number of biotech companies are developing these channelrhodopsin proteins for medical purposes, with many of these optogenetic therapy candidates under clinical trials to restore vision for retinal blindness. The first published results of these treating retinitis pigmentosa coming out in July 2021.

Project approach

Sequencing was initially done on the Illumina Genome Analyzer GAII next-generation DNA sequencing platform at the Beijing Genomics Institute (BGI Shenzhen, China), but later samples were run on the faster Illumina HiSeq 2000 platform. Starting with the 28 Illumina Genome Analyzer next-generation DNA sequencing machines, these were eventually upgraded to 100 HiSeq 2000 sequencers at the Beijing Genomics Institute. The initial 3Gb/run (3 billion base pairs per experiment) capacity of each of these machines enabled fast and accurate sequencing of the plant samples.

Species selection

The selection of plant species to be sequenced was compiled through an international collaboration of the various funding agencies and researcher groups expressing their interest in certain plants. and methodological details and data access details have been published in detail.

Transcriptome vs. genome sequencing

Rather than sequencing the entire genome (all DNA sequence) of the various plant species, the project sequenced only those regions of the genome that produce a protein product (coding genes); the transcriptome. Although this approach is similar conceptually to expressed sequence tags (ESTs), it is fundamentally different in that the entire sequence of each gene will be acquired with high coverage rather than just a small portion of the gene sequence with an EST. To distinguish the two, the non-EST method is known as "shotgun transcriptome sequencing".

Transcriptome shotgun sequencing

mRNA (messenger RNA) is collected from a sample, converted to cDNA by a reverse transcriptase enzyme, and then fragmented so that it can be sequenced. SOAPdenovo-Trans being part of the SOAP suite of genome assembly tools from the BGI.

Plant tissue sampling

The samples came from around the world, with a number of particularly rare species being supplied by botanical gardens such as the Fairy Lake Botanical Garden (Shenzhen, China). The type of tissue collected was determined by the expected location of biosynthetic activity; for example if an interesting process or chemical is known to exist primarily in the leaves, leaf sample was used. A number of RNA-sequencing protocols were adapted and tested for different tissue types, and these were openly shared via the protocols.io platform.

Potential limitations

Since only the transcriptome was sequenced, the project did not reveal information about gene regulatory sequence, non-coding RNAs, DNA repetitive elements, or other genomic features that are not part of the coding sequence. Based on the few whole plant genomes collected so far, these non-coding regions will in fact make up the majority of the genome, and the non-coding DNA may actually be the primary driver of trait differences seen between species.

Since mRNA was the starting material, the amount of sequence representation for a given gene is based on the expression level (how many mRNA molecules it produces). This means that highly expressed genes get better coverage because there is more sequence to work from. The result, then, is that some important genes may not have been reliably detected by the project if they are expressed at a low level yet still have important biochemical functions.

Many plant species (especially agriculturally manipulated ones) are known to have undergone large genome-wide changes through duplication of the whole genome. The rice and the wheat genomes, for example, can have 4-6 copies of whole genomes (wheat) whereas animals typically only have 2 (diploidy). These duplicated genes may pose a problem for the de novo assembly of sequence fragments, because repeat sequences confuse the computer programs when trying to put the fragments together, and they can be difficult to track through evolution.

Comparison with the 1000 Genomes Project

Similarities

Just as the Beijing Genomics Institute in Shenzhen, China is one of the major genomics centers involved in the 1000 Genomes Project, the institute is the site of sequencing for the 1000 Plant Genomes Project. Both projects are large-scale efforts to obtain detailed DNA sequence information to improve our understanding of the organisms, and both projects will utilize next-generation sequencing to facilitate a timely completion.

Differences

The goals of the two projects are significantly different. While the 1000 Genomes Project focuses on genetic variation in a single species, the 1000 Plant Genomes Project looks at the evolutionary relationships and genes of 1000 different plant species.

While the 1000 Genomes Project was estimated to cost up to US$50 million, the 1000 Plant Genomes Project was not as expensive; the difference in cost coming from the target sequence in the genomes. Since the 1000 Plant Genomes Project only sequenced the transcriptome, whereas the human project sequenced as much of the genome as is decided feasible, there is a much lower amount of sequencing effort needed in this more specific approach. While this means that there was less overall sequence output relative to the 1000 Genomes Project, the non-coding portions of the genomes excluded in the 1000 Plant Genomes Project were not as important to its goals like they are to the human project. So then the more focused approach of the 1000 Plant Genomes Project minimized cost while still achieving its goals.

Funding

The project was funded by Alberta Innovates - Technology Futures (merger of iCORE https://web.archive.org/web/20111108005614/http://albertatechfutures.ca/CapacityBuildingPrograms.aspx), Genome Alberta, the University of Alberta, the Beijing Genomics Institute (BGI), and Musea Ventures (a USA-based private investment firm). To date, the project received $1.5 million CAD from the Alberta Government and another $0.5 million from Musea Ventures. In January 2010, BGI announced that it would be contributing $100 million to large-scale sequencing projects of plants and animals (including the 1000 Plant Genomes Project, and then following on to the 10,000 Plant Genome Project).

References

[http://www.onekp.com Retrieved Feb. 25, 2010]
(2014). "Data access for the 1,000 Plants (1KP) project". GigaScience.
One Thousand Plant Transcriptomes Initiative. (October 2019). "One thousand plant transcriptomes and the phylogenomics of green plants". Nature.
(May 4, 2016). "Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants". Annual Review of Plant Biology.
(January 2008). "International genome project launched". Nature.
"About IGSR and the 1000 Genomes Project".
(March 1, 2018). "10KP: A phylodiverse genome sequencing plan". GigaScience.
(April 24, 2018). "Earth BioGenome Project: Sequencing life for the future of life". Proceedings of the National Academy of Sciences.
(November 2002). "Estimating the size of the world's threatened flora". Science.
"NCBI Taxonomy". NCBI.
(1985). "Summary of Green Plant Phylogeny and Classification". Cladistics.
(1991). "Phylogenetic connections between the'green algae'and the'bryophytes'". Advances in Bryology.
(January 1992). "Gene trees and species trees: molecular systematics as one-character taxonomy.". Systematic Botany.
(February 1, 2020). "Inferring putative ancient whole-genome duplications in the 1000 Plants (1KP) initiative: access to gene family phylogenies and age distributions". GigaScience.
(2002). "Potential hydrocarbon producing species of Western Ghats, Tamil Nadu, India". Biomass and Bioenergy.
(January 2000). "Engineering the provitamin A (beta-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm". Science.
(2006). "Plant physiology". Sinauer Associates.
(March 2014). "Independent optical excitation of distinct neural populations". Nature Methods.
(April 2020). "Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants". Annual Review of Plant Biology.
(July 2021). "Partial recovery of visual function in a blind patient after optogenetic therapy". Nature Methods.
"Retrieved Feb. 25, 2010".
(October 2019). "Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP)". GigaScience.
(November 21, 2012). "Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes". PLOS ONE.
(2009). "Applications of new sequencing technologies for transcriptome analysis". Annual Review of Genomics and Human Genetics.
(August 15, 2019). "RNA Isolation from Plant Tissue v1 (protocols.io.439gyr6)". Protocols.io.
(April 2002). "A draft sequence of the rice genome (Oryza sativa L. ssp. indica)". Science.
(2007). "Fast-evolving noncoding sequences in the human genome". Genome Biology.
(January 12, 2010). "BGI Seeks Proposals to Sequence 1,000 Plant, Animal Genomes; Pledges $100M Toward Effort". GenomeWeb.
(November 13, 2008). "Alberta iCORE researcher leads international genome project". Government of Alberta.
(2009). "The 1001 genomes project for Arabidopsis thaliana". Genome Biology.
Genome 10K Community of Scientists. (2009). "Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species". The Journal of Heredity.

Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

genome-projects

Want to explore this topic further?

Ask Mako anything about 1000 Plant Genomes Project — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report

1000 Plant Genomes Project

Goals

Evolutionary relationships

Biotechnology applications

Project approach

Species selection

Transcriptome vs. genome sequencing

Transcriptome shotgun sequencing

Plant tissue sampling

Potential limitations

Comparison with the 1000 Genomes Project

Similarities

Differences

Funding

Related projects

References

References