| Ecological intensification | Biomass energy | Food safety | Animal health | Public policy | |
| Agriculture and society |
Comparative genomics studies are carried out to predict the function of a species’ genes according to information available on another species through the identification of related genes, or orthologs, between species. CIRAD analysed orthologs between rice and Arabidopsis thaliana and used the findings to develop a database and tools for automated prediction of functional equivalence between sequences from ”orphan” species—for which few genomic tools have been developed—and those from model species. These tools are already being widely used by the scientific community.
One aim of comparative genomics is to predict the function of genes of one species according to information available on one or several other species. Once related genes between these species are identified, the annotation, ie biological information attached to a sequence, can be transferred from one species to another, less characterized, species. For this prediction, orthologs—genes derived from a common ancestor that were separated by a speciation event and which are likely to have the same function— are identified to facilitate this prediction. Phylogenetic methods based on kinship between species are implemented to pinpoint these orthologs. A CIRAD research team first looked for all orthologs between Oryza sativa (model monocot species) and Arabidopsis thaliana (model dicot species), which enabled it to then predict the function of the obtained sequences in ”orphan” species— for which few genomic tools have been developed but that are important for developing countries.
The first phase of this study involved selecting a pipeline, ie a series of meshed software programmes that automatically reproduce a series of analyses that would be manually conducted by a biologist. These programmes save time and enhance uniformity, but the results still have to be verified. A streamlined phylogenetic analysis of gene families was carried out via the selected set of programmes with the aim of predicting groups of orthologous sequences between O. sativa
and A. thaliana
. A dataset including all 69 families of transcription factors, ie proteins that regulate gene expression, was analysed and the automated analysis results were compared with published experimental data. The results of this analysis revealed that the chosen pipeline has significantly greater ortholog prediction performance than other popular prediction tools.
The GreenphylDB database was developed to manage gene family data and provide ready access to functional prediction data. It is currently the largest plant protein family database. GreenphylDB is linked with other databases to facilitate comparative functional analyses between O. sativa
and A. thaliana
. These public data are used by many research teams worldwide.
GOST (Greenphyl orthologous search tool) and i-GOST can be used for automated prediction of functionally equivalent genes between one (GOST) or several (i-GOST) sequences of orphan species and O. sativa
or A. thaliana
. These very easy-to-use tools can serve to easily transfer information from a model species to an agriculturally interesting species through a set of sequences of unknown function. To enhance predictions, all genomic sequences from 10 new plant species were inserted in GreenphylDB gene families with an 85–95% success rate. These were genomes from grapevine, Populus trichocarpa,
the moss Physcomitrella patens, sorghum, Selaginella molendoffori,
soybean, Medicago trunculata
and three algae species (Chlamydomonas reiinardti, Ostreoccocus tauri, Cyanidioschyzon merolae)
. A global analysis is under way.
With new sequencing techniques and decreased costs, other genomes of plants of agricultural interest, especially tropical plants, will soon be completely or partially sequenced. Research is currently focused on a tool, supported by i-GOST, that is used for automated annotation of new plant genomes based on data available on model species. From a fundamental standpoint, these studies will enable researchers to determine when key genes appeared during the evolution of terrestrial plants and to test their function at different taxonomic levels ranging from lower plants, such as mosses, to angiosperms and gymnosperms. This research could eventually come up with a solution to Darwin’s ”abominable mystery” of flowering plants.
Christophe Périn
Plant Development and Genetic Improvement (UMR DAP)
E-mail