The complete rice genome sequence has now been mapped. It comprises 40 000 to 60 000 genes. The next step is to determine its biological function, through functional genomics studies. To facilitate these studies, CIRAD has recently set up a database, containing all the essential information on the rice genome, in particular the flanking sequence tags (FSTs) around the integration sites of mutagenes present in insertion mutant collections. These tags serve to pinpoint gene functions by linking the gene and the phenotype directly, in a reverse genetics approach.
In addition to its agricultural merits, rice has the advantage of having a small genome and similarities with other cereals in terms of sequences and genetic organization, which makes it a model plant for studying monocots. Sequencing of its genome was completed in December 2004. This revealed an unexpected abundance of genes: around 40 000 to 60 000, compared to just 27 000 for Arabidopsis thaliana, the model species for dicots. The next step is to determine the functions of all these genes, through functional genomics studies.
To determine these functions, several studies have been undertaken, based on integrational mutagenesis. This method enables the random insertion of an identifiable DNA fragment, DNA-T or transfer DNA (or a transposable element) into the genome. When the fragment is integrated into a gene, it may alter the gene’s function and trigger a change in the corresponding character. The mutated gene is located thanks to the inserted element, and its function pinpointed through the affected character. The systematic identification of insertion sites calls for large-scale sequencing using the integrated mutagenic elements.
A collection of 30 000 ADN-T lines and 40 000 FSTs has been compiled at CIRAD as part of the Génoplante project. Likewise, various international laboratories have also built up mutant collections. An integrative database, OryGenesDB, which contains all these resources along with the main genomic data on rice, has been developed to exploit this information.
The aim of the OryGenesDB information system is to enable molecular geneticists to find insertion mutants for worthwhile genes quickly, and to pinpoint as many annotations linked to those genes as possible, through a reverse genetics approach, from
sequence to phenotype.
The heart of the system is the Genome Browser generic software, a web application for visualizing genomic annotations. It has a user-friendly graphic interface that allows users to surf the genome and visualize all the available genomic annotations. The software’s system of reference corresponds to rice pseudomolecules, or chromosomes, from the Institute for Genomic Research (TIGR) website. In addition to FSTs, various items of information, such as full-length DNA, expressed sequence groups for several cereals (wheat, maize, barley, sorghum and sugarcane), molecular markers and expression data, have been integrated into the system in the form of annotation layers. Complementary tools have also been developed, to facilitate information searches and visualization: searches by accession number, keyword, conserved protein domain or sequence homology. The search result can then be stored as an Excel file.
To simplify and intensify rice genome functional analyses, two other databases, developed in parallel, are to be coupled with OryGenesDB. Oryza Tag Line (OTL) is the phenotype equivalent of OryGenesDB. It lists all the morphological and physiological data gathered on DNA-T insertion lines. Coupling these two bases will enable the rapid identification of the effect of a mutation in a given gene, by looking into the morphological and physiological characteristics of the corresponding plants. Lastly, to exploit the mass of information gathered on Arabidopsis thaliana, a new database, Greenphyl, has been compiled. It serves to classify all the sequences from rice and Arabidopsis in families, and includes an automatic tool that determines the most likely functional equivalents in the two species.
OryGenesDB is now the central database for rice gene functional analyses at CIRAD. It is widely used by the international community. Its power and simplicity should also make it an indispensable tool for exploring the function of genes of agricultural interest in other cereals.
Christophe Périn,
e-mail
; Gaëtan Droc ; Pierre Larmande ; Emmanuel Guiderdoni ; Manuel Ruiz ; Brigitte Courtois
UMR: Polymorphisms of Interest in Agriculture (PIA)
The development of OryGenesDB was funded by the European Commission (Cereal Gene Tag Project CT-2001-01453) and the “Generation” Challenge Programme (Rice Stress Mutants Project).