Main Dataset

SNV-MWAS Catalog

Complete SNV-MWAS association results including SNV information, phenotype associations, and statistical significance data.

Download TSV
Format: TSV
Size: ~863 MB
Records: 1,389,120 entries

Reference Data

Species Information

Comprehensive species taxonomy and metadata information for all microorganisms in the SNV-MWAS catalog.

Download
Format: TSV
Species: 909 entries
Taxonomy: Complete

Reference Genomes

Complete reference genome sequences for all 909 species in the UHGG database.

Download
Format: FASTA (compressed)
Size: ~755 MB
Species: 909

Annotation Files

Gene Annotations (GFF3)

Gene coordinates, functional annotations, and pathway information for SNVs in the catalog.

Download
Format: GFF3 (compressed)
Size: ~132 MB
Annotations: 516,851 entries

Detailed Gene Annotation (TSV)

Functional annotations and protein information mapped to UniProt database entries.

Download
Format: TSV (compressed)
Size: ~1.2 GB
Proteins: 4,679,748 entries

SNV Impact Annotation (TSV)

SNV impact predictions and functional consequences for coding and non-coding regions.

Download
Format: TSV (compressed)
Size: ~343 MB
SNVs: 1,398,210 entries

Sequences

CDS Sequences

Coding DNA sequences (CDS) for all protein-coding genes in the reference genomes.

Download
Format: FASTA (compressed)
Size: ~730 MB
CDS: 2,370,420 sequences

File Formats

Supported Formats
  • TSV: Tab-separated values (recommended for large datasets)
  • CSV: Comma-separated values (compatible with Excel)
  • FASTA: Sequence data format
  • GFF3: Gene annotation format
  • GZ: Compressed files (use gunzip to decompress)
Download Tips
  • Large files (>100MB) are compressed to reduce download time
  • Use a download manager for large files
  • Check file integrity after download
  • Contact us if you encounter any issues

Usage Guidelines

Search Page
  • Open the Quick Search page and enter a phenotype, species, or SNV identifier in the search box.
  • Use the suggested example tags to prefill the search box and explore typical queries.
  • Submit the form to view a summarized result table with key study-wide significant associations.
  • Each result links to detailed records that you can export or investigate further in the browser.
Browser Page
  • Navigate to the Browser to work with the full catalog using interactive filters.
  • Adjust cohort, phenotype, species, SNV type, and significance filters to refine the dataset.
  • Sort any column, toggle column visibility, and follow the gene download icon to retrieve reference sequence bundles.
  • If a gene link is unavailable, the variant may lie in a non-coding gene or regulatory region; use the downloads for gene annotations and reference genomes to retrieve sequences manually.
  • The table defaults to meta-analysis associations that meet the genome-wide significance threshold; adjust filters to explore alternative cut-offs or cohorts.
  • Use the built-in export tools (CSV download) to capture the currently filtered dataset.
Download Page
  • Select pre-packaged TSV, FASTA, GFF3, or gzipped resources for offline analysis.
  • Review file sizes and formats to plan download time and local storage requirements.
  • Use a download manager for large archives (>100 MB) and verify file integrity after transfer.
  • Check the Usage Guidelines above for citation, licensing, and support information.
Best Practices
  • Combine Search for quick discovery and Browser for deep dives before initiating bulk downloads.
  • Document the filter combinations and export timestamps to support reproducible analyses.
  • Reach out via j.fu@umcg.nl if you need tailored subsets or encounter performance issues.
  • Consider indexing large TSV files locally (e.g., DuckDB, SQLite) to speed downstream exploration.
Citation

If you use SNV-MWAS data in your research, please cite:

Daoming Wang, et al. Microbiome-wide PheWAS links gut microbial SNVs to human health and exposures, 25 September 2024, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-5063726/v1]
Data Usage
  • Data is provided for research purposes only
  • Please respect the terms of use for individual source studies
  • Contact us for commercial use permissions
Contact

For questions about data access or usage:

Email: j.fu@umcg.nl
Website: fulab.web.rug.nl