Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods currently used to process metagenomic sequences, simulated datasets of varying complexity were constructed by combining sequencing reads randomly selected from 113 isolate genomes. These datasets were designed to model real metagenomes in terms of complexity and phylogenetic composition. Assembly, gene prediction and binning, employing methods commonly used for the analysis of metagenomic datasets at the DOE JGI, were performed. This site provides access to the simulated datasets, and aims to facilitate standardized benchmarking of tools for metagenomic analysis.
We would like to invite members of the scientific community to use these datasets, to evaluate new methods, and submit their results in order to create a comprehensive resource for the comparison of methods.
If you use the data or results found on this site please cite the
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Konstantinos Mavromatis, Natalia Ivanova, Kerrie Barry, Harris Shapiro, Eugene Goltsman, Alice C McHardy, Isidore Rigoutsos, Asaf Salamov, Frank Korzeniewski, Miriam Land, Alla Lapidus, Igor Grigoriev, Paul Richardson, Philip Hugenholtz, Nikos C Kyrpides
Nature Methods 2007 Jun;4(6):495-500.
Datasets and Methods
Assembly Method Comparison
Binning Method Comparison
Gene Function Prediction Method Comparison
FAMeS now hosts data coming from a comprehensive study of methodologies used to create OTUs from 16S rRNA targeted studies of microbial communities. Studies of phylogenetic markers at the molecular level have revealed a vast biodiversity of microorganisms living in the sea, land, and even within the human body. Microbial diversity studies of uncharacterized environments typically seek to estimate the richness and diversity of endemic microflora using a 16S rRNA gene sequencing approach. When most of the species in an environment are unknown and cannot be classified through a database search, researchers cluster 16S sequences into operational taxonomic units (OTUs) or phylotypes, thereby providing an estimate of population structure. Using real 16S sequence data, we have performed a critical analysis of OTU clustering methodologies to assess the potential variability in OTU quality. Here we provide the sequence data, taxonomic information, multiple sequence alignments, and distance matrices used in our study, as well as our compiled results of 700+ unique OTU methods. You can find the published paper at this link. The data is accessible from the download page.