Gramene Release 35 - June 2012
Gramene Database:
Genomes Release Notes
[ top ]
- Added New genomes for Brassica rapa, Oryza brachyantha, Setaria italica,
Solanum lycopersicum (tomato), Oryza meridionalis (chr3s), and Oryza glumaepatula (chr3s).
- Added variation data for Oryza glaberrima and Zea mays.
- Added oligo probes from the GeneChip Maize Genome Array. E.g., Zm.155.1.A1_a_at.
- Added wheat gene sequences alignments to Brachypodium distachyon as BAM tracks.
- Updated gene split models including predictions for Brassica rapa, Oryza brachyantha, Setaria italica, and Solanum lycopersicum.
- Updated peptide and DNA comparative genomics databases.
- Updated BioMarts.
- Updated to Ensembl schema and API Version 67
Genomes Core
Ensembl Variation
New |
Zea mays HapMap v2 |
Oryza glaberrima |
Unchanged |
Oryza sativa indica |
Oryza sativa japonica |
Arabidopsis thaliana |
Vitis vinifera |
Functional Genomics
No updates in this release. Last update done in release 34 (October 2011).
Compara
Gene Tree
The EnsemblCompara GeneTree database was rebuilt using the core genome databases listed above excluding the OGE chromosome 3 short arm genomes, with the addition of Ciona savigny, Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae to provide taxonomic depth. From a total of 736,920 input proteins, 38,001 GeneTree families were constructed, comprising 646,588 individual genes.
Potential gene annotation artifacts
As a service to the community, we are releasing the results of screening for "putative gene annotation artifacts" on 18 plant reference genomes hosted at Gramene.
The split gene models are commonly related to an annotation artifact wherein a single gene is annotated as two or more genes due to incomplete evidence, but could also result from legitimate evolutionary processes. The Compara Gene Tree method predicts a special class of within-species paralogs called "contiguous_gene_split". A contiguous_gene_split is called when the two apparently paralogous genes lie on the same strand and in close proximity (<1MB) but have no (or little) overlapping sequence. For more information on the method, please read the Ensembl documentation on Gene Orthology/Paralogy prediction. The split gene models have been updated and we are also now providing them for the new genomes (Brassica rapa, Oryza brachyantha, Setaria italica, Solanum lycopersicum.
Synteny
No updates in this release. Last update done in release 34 (October 2011).
WGA
- Upgrade the analysis pipeline to Lastz-chain-net and replaced the following analysis from old Blastz-net alignments to Lastz-net alignments
- Oryza sativa Japonica v.s. Oryza sativa Indica
- Oryza sativa Japonica v.s. Oryza barthii (chr3s)
- Oryza sativa Japonica v.s. Oryza glumipatula (chr3s)
- Oryza sativa Japonica v.s. Oryza meridionalis (chr3s)
- Oryza sativa Japonica v.s. Oryza minutabb (chr3s)
- Oryza sativa Japonica v.s. Oryza minutacc (chr3s)
- Oryza sativa Japonica v.s. Oryza nivara (chr3s)
- Oryza sativa Japonica v.s. Oryza officinalis (chr3s)
- Oryza sativa Japonica v.s. Oryza punctata (chr3s)
- Oryza sativa Japonica v.s. Oryza rufipogon (chr3s)
-
Added the following new pairs run by Lastz-chain-net pipeline:
- Oryza sativa Japonica v.s. Brassica rapa
- Oryza sativa Japonica v.s. Setaria italica (foxtail millet)
- Oryza sativa Japonica v.s. Oryza brachyantha
- Oryza sativa Japonica v.s. Solanum lycopersicum (tomato)
- Oryza sativa Japonica v.s. Cyanidioschyzon merolae (red algae)
- Arabidopsis thaliana v.s. Brassica rapa
- Arabidopsis thaliana v.s. Setaria italica (foxtail millet)
- Arabidopsis thaliana v.s. Oryza brachyantha
- Arabidopsis thaliana v.s. Solanum lycopersicum (tomato)
- Arabidopsis thaliana v.s. Cyanidioschyzon merolae (red algae)
Protein Annotation, Go, Xref
No updates in this release. Last update done in release 34b (February 2012).
Mart
Maps and Markers Release Notes
[ top ]
No updates in this release. Last updated done in release 33 (April 2011).
Gramene's comparative maps database hosts 215 map sets from
genetic, physical, bin, sequence, cytogenetic, and QTL studies.
See also the detailed Map Module statistics report.
Marker breakdown by type
Marker Type |
Count |
AFLP |
8,149 |
Breakpoint Interval |
303 |
Centromere |
57 |
Chromosomal Segment |
10 |
Clone |
3,549,765 |
Cytological Structure |
8 |
DArT |
1,158 |
Deletion |
333 |
EST |
18,974,596 |
EST Cluster |
4,633,006/td>
|
FISH Probe |
37 |
FPC |
17,479 |
Gene |
8,916 |
Gene Candidate |
17 |
Gene Family |
3 |
Gene Prediction |
354,564 |
Gene Primer |
19 |
Genomic DNA |
1,827,315 |
GSS |
9,459,811 |
Insertion |
308 |
ISBP |
691 |
Lapsed Locus |
4 |
Maize Bin |
114 |
Microarray Probe |
260,656 |
mRNA |
953,688 |
Oligo |
2,396,466 |
OVERGO |
23,921 |
Point |
332 |
Primer |
76,036 |
Probed Site |
5,220 |
Pseudogene |
1 |
QTL |
11,625 |
RAPD |
175 |
Restriction Fragment |
5 |
RFLP |
18,754 |
SNP |
2,942 |
SSR |
24,389 |
STS |
3,290 |
Telomere |
20 |
Transposable Element |
4 |
Undefined |
1,391 |
|
Marker breakdown by species
Species | Count |
Hordeum | 713,300 |
Zea | 7,604,538 |
Avena | 71,594 |
Oryza | 6,739,198 |
Secale | 15,058 |
Sorghum | 1,513,436 |
Saccharum | 44,2434 |
Triticum | 1,432,375 |
|
See also the detailed Marker Module statistics report.
Proteins Release Notes
[ top ]
No updates in this release. Last update done in release 31 (May 2010).
The Gramene protein database provides a heterogeneous set of
functional annotations from sources such as SWISS-PROT and TrEMBL,
and which include Pfam, Prosite, and Interpro assignments. Various
ontologies such as Gene Ontology (GO), Plant Ontologies (PO), Trait
Ontology (TO) and Environment Ontology (EO) are used to attribute
functional characteristics.
See also the detailed Protein Module statistics report.
Ontologies Release Notes
[ top ]
Various ontologies and their associations to genes, gene models,
proteins, QTL, markers and maps were updated. Below is a summary of the
ontologies data.
Ontology
| Total terms
| Total Terms w/Associations
| Total Associations
|
Gene Ontology (GO)
| 35,728
| 2,396
| 422,611
|
Plant Ontology (PO)
| 0
| 0
| 0
|
Growth stage Ontology (GRO)
| 236
| 80
| 151,040
|
Trait Ontology (TO)
| 1,146
| 536
| 14,236
|
Taxonomy Ontology (GR_tax)
| 58,550
| 45,927
| 374,161
|
Environment Ontology (EO)
| 501
| 82
| 62,724
|
Gazetteer ontology (GAZ)
| 504,297
| 0
| 0
|
See also the detailed Ontology Module statistics report.
Genes and Alleles Release Notes
[ top ]
No updates in this release. Last update done in release 31 (May 2010).
A more detailed genes database statistics report can be found here.
QTL Release Notes
[ top ]
No updates in this release. Last update done in release 27 (April 2008).
A more detailed QTL database statistics report can be found here.
Pathways Release Notes (version 3.4)
[ top ]
Pathway databases are available dynamically via the Gramene's pathway server and for bulk download* via the Gramene's FTP site.
MaizeCyc has been updated to version 2.0.1
- Deleted pathways:
Glycine degradation I <-- Strictly for anaerobic bacteria
- Deleted reactions (orphan reactions with no gene associations and may not exist in plants):
RXN-12104, RXN-9544, RXN-9546, RXN-9545, 2.7.7.1-RXN, 2.7.7.33-RXN, COBINPGUANYLYLTRANS-RXN, and RXN-9990
- Deleted proteins:
Renin
- Deleted compounds:
Aggrecan, Heparin, Thrombin, SoxZY-S-Thiocysteine, Factor-V, Facton-I, and Ecdysteroids
RiceCyc has been updated to version 3.3
- Manually curated pathways:
Trehalose; Stachyose; Delphinidin 3-O-glucoside; Pelargonidin 3-O-glucoside & Cyanidin 3-O-glucoside; Gentiodelphin; Flavonol; Pinobanksin; 6,7,4'-trihydroxyisoflavone; Maackiain; Medicarpin; Leucodelphinidin; Leucopelargonidin & Leucocyanidin; and Proanthocyanidin biosynthesis from flavonols
* Only if created and maintained by Gramene.
Diversity Release Notes
[ top ]
Data sets
New:
Oyza glaberrima. This variation set comprises ~1 million SNPs identified within 13 accessions of African rice O. glaberrima and eight accessions of its wild progenitor, O. barthii, which were collected from geographically distributed regions of Africa. Most lines were sequenced to 7-15X coverage using Illumina GAIIx technology. Six of the O. glaberrima lines were sequenced to >100X coverage using Illumina HiSeq2000 technology. Sequence reads were aligned to the O. glaberrima reference genome using BWA. SNPs were called using the multiple-sample method in the GATK package, applying a quality score threshold, and minimum and maximum alignment coverage as criteria filters.
Updated:
-
Oryza sativa japonica. Updated to MSU7 coordinates for Rice Diversity and 1536 chip experiments.
-
Zea mays HapMap v2. This variation set comprises the maize HapMap2 data, 55 million SNPs and indels identified in a collection of 103 pre-domesticated and domesticated Zea mays varieties, including a representative from the sister genus, Tripsacum dactyloides (Eastern gamagrass). Each line was sequenced to an average of 4.5X coverage using the Illumina GAIIx platform. The reads can be accessed from the SRA, with accession ID: SRA051245. Reads were mapped to the B73 reference genome using a combination of Bowtie, Novoalign and SOAP. The variations were scored by taking into account identity-by-descent blocks that are shared among the lines (Chia et al, 2012).
Unchanged:
- Arabidopsis, Oryza sativa spp, and Vitis vinifera
Software
Tassel
Introduction of Tassel 4
- Additional Web-launch TASSEL files created to allow diversity data sets to be loaded automatically
- Consistent Implementation of all data types (SNP, Haplotypes, etc.)
- Bitwise Data Structures for Speed (40 - 200 fold increase) and Memory Efficiency
- Many improvements to software architecture and design
- New QQ and Manhattan Plots
- 70% speed improvement to Cladogram Function
- Improved LD results display
- Many improvements to Progress Monitoring
- Much improved Taxa and Site Name Filtering
- Tassel on iPlant Discovery Environment and Atmosphere
- New Genotype Summary Function
- More User Friendly Alignment Viewer
- Improved Error Messages
- GLM and MLM:
- GLM interface simplified
- Compression and faster P3D implemented for MLM resulting in reduced runtime
- Matrix Algebra library wrapper written to make switching to newer, faster libraries easier
- EJML Matrix Algebra library interface implemented
- Pipeline (Command Line Interface)
- Automates complex loading/analysis pipelines
- Doesn't need Java coding to create
- Has simultaneously executing pipeline segments
- Works from web site launch, command line, and GUI
- Self-Describing Plugins
- Integrated GBS Pipeline with existing Tassel Pipeline
- Added more parameters to command line interface (i.e. LD, Site Filtering, Numerical Transform)
- Can Define Pipelines in XML
SNP query
No updates in this release. Last update done in release 34 (October 2011).
Association Viewer
Explore Atwell GWAS data in our Association Viewer (beta).
Click here for database summary.
Germplasm
[ top ]
No updates in this release. Last update done in release 34 (October 2011).
Literature
[ top ]
- Wei-X, Xu-J, Guo-H, Jiang-L, Chen-S, Yu-C, Zhou-Z, Hu-P, Zhai-H, Wan-J, "DTH8 suppresses flowering in rice, influencing plant height and yield potential simultaneously", Plant Physiol, 2010, vol. 153, pp. 1747-1758
- Ebana-K, Shibaya-T, Wu-J, Matsubara-K, Kanamori-H, Yamane-H, Yamanouchi-U, Mizubayashi-T, Kono-I, Shomura-A, Ito-S, Ando-T, Hori-K, Matsumoto-T, Yano-M, "Uncovering of major genetic factors generating naturally occurring variation in heading date among Asian rice cultivars", Theor Appl Genet, 2011, vol. 122
- Mao-H, Sun-S, Yao-J, Wang-C, Yu-S, Xu-C, Li-X, Zhang-Q, "Linking differential domain functions of the GS3 protein to natural variation of grain size in rice", Proc Natl Acad Sci U S A, 2010, vol. 107, pp. 19579-19584
- Zhao-K, Tung-C, Eizenga-G, Wright-M, Ali-L, Price-A, Norton-G, Islam-M, Reynolds-A, Mezey-J, McClung-A, Bustamante-C, McCouch-S, "Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa", Nat Comm, 2011, vol. 2:467
Website
[ top ]
Distributed Annotation System
Gramene provides a DAS server for
our own markers and
sequences database as well as our Ensembl
databases. These can use by the Ensembl genome browser as well as
outside applications with a need to display sequences annotations on
our various genome assemblies. Substantial improvements to the
architecture should improve performance and usability considerably.
Our DAS resources make use of FastBit for the retrieval engine. We
offer DAS tracks for sixteen species. See our DAS sources.
Diversity via GDPC/TASSEL
The TASSEL program
via Genomic Diversity and Phenotype Connection (GDPC) can connect to
Gramene's genetic diversity databases for rice, maize, and arabidopsis
for detailed analysis by the user.
Web Services
Gramene's web
services page documents many ways to directly connect to and analyze
our databases.
Public MySQL Server
Gramene provides direct MySQL access to our core Ensembl databases for
each of our sequenced genomes as well our databases for markers,
sequences, genes, QTL and ontologies. To connect, use the following:
mysql -hgramenedb.gramene.org -pgramene