Annotation detail

Full-length CDS prediction

Full-length CDS prediction in TriFLDB is carried out by 2 methods: identifying the longest open reading frame (ORF) and using DECODER (Fukunishi and Hayashizaki 2001). The results of the predictions of each full-length cDNA have been provided and an overview of the predictions is also shown on the statistics page.

Similarity search results

Similarity search against each of the various databases was carried out to predict the gene function of TriFLCDS. The following databases were applied for the similarity search: the nonredundant (nr) protein database of NCBI and UniProt/trembl of EBI as the typical representative protein database; Rice Annotation Project Database (RAP-DB) and The Institute for Genomic Research (TIGR) rice database as the annotated rice protein dataset; protein data of predicted genes in Sorghum genome from the Joint Genome Institute (JGI); The Arabidopsis Information Resource (TAIR) as the annotated Arabidopsis protein dataset; and also cDNA sequences of barley and wheat of UniGene, TIGR GI, Plant-Genome database (GDB), and HarvEST as clustered or representative cDNA sequences. To support the predicted reading frame of cDNA, a similarity searches using BLASTX as well as BLASTP were performed with nucleotide sequences and corresponding translated protein sequences as queries against the protein databases.

Hierarchical protein cluster

For phylogenetic insights, the predicted protein sequence of TriFLCDS entries were hierarchically clustered with proteome databases of other plants, i.e., Arabidopsis (TAIR7), rice (RAP-DB), and Sorghum (JGI), on the basis of the amino acid identities using the Cd-Hit package. The Cd-Hit clustering was carried out with a threshold of global identity of amino acid sequence hierarchically from 100% to 40% and with a local identity at 30% with 10% reduction at each step. The "Hierarchically Clustered Protein Viewer" allows users to find homologous counterparts with hierarchical identity thresholds of 90%, 60%, and 30%. Furthermore, clustered proteins and protein sequences of the TriFLCDS entries were aligned using ClustalW in each hierarchical cluster.

Protein domain organization and GeneOntology assignment

Protein domain searches for TriFLCDS entries were carried out by using the iprscan program. The GO terms assigned with each entry have also interrelated based on iprscan search.


TriFLDB: A Database of Clustered Full-Length Coding Sequences from Triticeae with Applications to Comparative Grass Genomics.
Keiichi Mochida, Takuhiro Yoshida, Tetsuya Sakurai, Yasunari Ogihara, and Kazuo Shinozaki
 Plant Physiol. 2009 May 15 PubMed Logo

Homology search

The NCBI WWW BLAST has been implemented on the TriFLDB server as TriFL-BLAST. The TriFL-BLAST provides nucleotide and protein sequences deduced from TriFLCDS entries as well as other plant proteome data of Arabidopsis (TAIR7), rice (RAP-DB), and Sorghum (JGI, ver. 1.4) since it is a BLAST search database.

Homology mapping onto the brachypodium and rice genome

To predict the gene structure of each TriFLCDS, SIM4 is used to map the cording sequences to the rice genome sequences to show the predicted exon-intron structures with rice gene annotations in RAP-DB ( and those of Sorghum in JGI ( Exon-intron structures and associated rice genome annotation data are displayed using the Generic Genome browser (Gbrowse).

EST assembly with TriFLCDS

Wheat and barley ESTs currently available in dbEST were correlated with the TriFLCDS entries based on nucleotide identity by using the BLAST search. Each of EST alignments with TriFLCDS has been shown as contig alignment browser.

Site map

Site map

TriFLDB archive