- Strains: 753
- WGS strains: 613
- Isotypes: 330
- Genome: WS263
Alignment Data
Wild isolate genomes are aligned and stored using the BAM format. BAMs are available in the table below.
Downloading All Alignment Data
You can download all alignment data using the script below. Before this script will work, you need to download and install wget. We recommend using Homebrew for this installation (Unix/Mac OS), or Cygwin on windows. See the FAQ for details on installing wget.
Methods
Information regarding alignment, variant calling, and annotation are available here.
Variant Data
We used samtools to identify single-nucleotide variant (SNV) sites as compared to the N2 reference genome. Variant data are provided as VCF or tab-delimited files.
VCF
VCFs generated from the variant calling pipeline are provided below.
- Soft-filter - Includes all variants and annotations. The QC status of variants is included.
- Hard-filter - Variants and genotypes that fail QC are removed.
- Imputed - An imputed dataset generated from the hard-filter VCF.
You can programmatically access specific regions of VCF files (rather then the entire file) from the command line:
Download Strain Data
Included Variants
Currently, we have performed variant calling across all wild isolates. We are working to add additional variant classes including insertion/deletion, structural, transposon, and additional variant classes.
Transposon Data
We have recently performed an analysis characterizing Transposon variation in C. elegans. The dataset will be further integrated with the site resources as time goes on. For now, the raw data are available below.
Download Transposon Data# | Reference Strain | Isotype | Strains | Isotype BAM | Tab-delimited variants |
---|
The following statistics were generated with bcftools stats
. The soft-filtered VCF for this release has had records and genotypes annotated but no data has been removed. The hard-filtered VCF removes records and genotypes that have been annotated with filters.
The hard-filtered VCF has stripped records and genotypes that have had filters applied.
Variable | Soft-Filtered | Hard-Filtered |
---|---|---|
Samples | 330 | 330 |
SNVs | 3,396,485 | 2,493,687 |
Ts | 1,900,352 | 1,392,602 |
Tv | 1,623,877 | 1,101,085 |
Ts/Tv | 1.17 | 1.26 |
Methods
Note: These methods operated on sequence data at the isotype level.
Software
Alignment
Sequences were aligned to WS245 using BWA (version 0.7.8-r455). Optical/PCR duplicates were marked with PICARD (version 1.111).
Variant Calling
SNV calling was performed using bcftools (version 1.3).
Filtering
Sites with greater than 10% missing or greater than 90% heterozygous calls across all isotypes were removed. Individual calls with the following parameters were removed:
- Depth of coverage (DP) <= 10
- Quality (QUAL) < 30
- Mapping Quality (MQ) < 40. Only applied to ALT calls.
- Number of high-quality non-reference bases (DV) / Depth of Coverage (DP) < 0.5. Applied only to ALT calls.
Annotation
Variants were annotated using SnpEff (version 4.1g) using the WS241 database.
Pipelines
The C. elegans Natural Diversity Resource has three git repos which contain the software used to run the site.
AndersenLab/cegwas
A set of functions to process phenotype data, perform GWAS, and perform post-mapping data processing for C. elegans.
AndersenLab/cegwas-worker
A python daemon that handles submitted mapping jobs from base. cegwas-worker
Runs on Google Compute Engine.
AndersenLab/CeNDR
The software responsible for this website, which is run using Google App Engine.
Fetching variant data from the command line
Variant data can be fetched remotely using bcftools version 1.2+. If you don't have bcftools installed, you can learn how to install it here.
bcftools will download the index file and use it to fetch the specified region. Below are some examples.
Query a the first 10kb on chromosome IIbcftools view http://storage.googleapis.com/elegansvariation.org/releases/20180527/variation/WI.20180527.soft-filter.vcf.gz II:1-10000Output a tab-delimited file of genotypes
bcftools view http://storage.googleapis.com/elegansvariation.org/releases/20180527/variation/WI.20180527.soft-filter.vcf.gz II:1-10000 | \ bcftools query --print-header -f '%CHROM\t%POS\t[%TGT\t]\n'Look for deleterious variants within a region in CB4856.
bcftools view --samples CB4856 http://storage.googleapis.com/elegansvariation.org/releases/20180527/variation/WI.20180527.soft-filter.vcf.gz II:790000-792000 | \ egrep '^#|HIGH' - | \ egrep '^#|1\/1' - | \ bcftools query -f '%CHROM\t%POS\t[%TGT\t%GT\t]%ANN\n' -