FAMeS: Fidelity of Analysis of Metagenomic Samples

Download Data

Simulated Metagenomes

Sequence files contain the sequence of paired reads in fasta format. Quality files contain information necessary for the assembly.

Dataset sequence files quality files
SimLC Download Download
SimMC Download Download
SimHC Download Download

Note: filenames correspond to taxon_oids found in IMG

The sizes of the libraries used for this project can be found here. Reads that do not correspond to any of the libraries in this file can be considered either single reads (i.e. without a valid pair read) or belonging to 3Kb libraries.

The list of the reads and their corresponding origin can be found here.
The genes that are included in the "reference" genomes are here. NOTE: this file contains all the protein coding genes that are included in the three metagenomes.
The overlap of the genes with the sequencing reads are here.

Assemblers

A file that contains the coordinates of the reads on the assembled sequences can be found here.

Dataset phrap Arachne Jazz
SimLC Contigs
Singlets
Contigs
Singlets
Scaffolds
Singlets
SimMC Contigs
Singlets
Contigs
Singlets
Scaffolds
Singlets
SimHC Contigs
Singlets
Contigs
Singlets
Scaffolds
Singlets

JAZZ generates scaffold sequences (i.e. multiple contigs connected with streches of Ns).
The coordinates of contigs on scaffolds can be found here.


The taxonomic assignment of each contig can be found here.


Gene Prediction Methods

dataset phrap Arachne Jazz
SimLC Download Download Download Fgenes
Download Download Download CRITICA/Glimmer
SimMC Download Download Download Fgenes
Download Download Download CRITICA/Glimmer
SimHC Download Download Download Fgenes
Download Download Download CRITICA/Glimmer

Binning Methods


dataset Phrap Arachne JAZZ
SimLC Download Download Download kmer
Download Download Download PhyloPythia
Download Download Download BLAST distribution
SimMC Download Download Download kmer
Download Download Download PhyloPythia
Download Download Download BLAST distribution
SimHC Download Download Download kmer
Download Download Download PhyloPythia
Download Download Download BLAST distribution

OTU analysis

Fasta formatted file of all 1677 sequences from this study.
Taxonomic identity of each sequence down to the species level.
Spreadsheet that displays all examined methodologies in this study. The VI column indicates the VI distance from the true species clustering. The ACE and CHAO1 columns are nonparametric estimators of the total number of species in the sampled environment. The SHANNON column is the compute Shannon diversity index.

Distance matrixMultiple sequence alignment
NAST CLUSTALW MUSCLE
Kimura Download Download Download
Jukes Download Download Download
Felsenstein Download Download Download Pairwise
Olsen Download Download Download