Release Notes
The 20250625 release includes genotypes from whole-genome sequences and reduced representation (RAD) sequencing. Genotypes are compared for concordance, and strains that are 99.97% identical to each other are grouped into isotypes. One strain within each isotype is the reference strain for that isotype. To look up isotype assignment, see Isotype List. All isotype reference strains are available on CaeNDR.
- Strains: 2088
- WGS strains: 1952
- Isotypes: 687
- Genome: WS283
- BioProject: PRJNA13758
Strain | Isotype | Previous Isotype | Reason |
ECA243 | ECA243 | CB4851 | Historical name updated to match WGS strain name |
ECA245 | ECA246 | CB4853 | Historical name updated to match WGS strain name |
ECA246 | ECA246 | CB4853 | Historical name updated to match WGS strain name |
ECA248 | ECA248 | CB4855 | Historical name updated to match WGS strain name |
ECA249 | ECA250 | CB4857 | Historical name updated to match WGS strain name |
ECA250 | ECA250 | CB4857 | Historical name updated to match WGS strain name |
AB2 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
AB3 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
AB4 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
CX11258 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
CX11278 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
CX11305 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
CX11317 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
ECA247 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
ECA251 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
JU1960 | ECA251 | CB4858 | Historical name updated to match WGS strain name |
ECA259 | ECA259 | PB306 | Historical name updated to match WGS strain name |
JU4133 | JU1793 | JU2600 | Higher similarity to new isotype group |
WN2109 | WN2109 | WN2086 | Higher similarity to new isotype group |
Datasets
Dataset | Description | Download |
---|---|---|
Strain Data | Includes strain, isotype, location information, and more. | 20250625_c_elegans_strain_data.csv |
Strain Issues | This link contains all strain issues for this release | |
Alignment Data |
Alignment data are stored as BAM files, which are binary representations of the Sequence Alignment/Map format. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
This link contains all alignment data as BAM or BAI files. |
Variant Data | Strain-level variant information is stored in the VCF and genomic VCF format. The gVCF format contains information for every base regardless of whether a variant is present or not and is suitable for compiling and joint calling variants across a custom strain set. These files were produced by GCTA. | This link contains all genomic variant data as VCF, TBI, or gVCF files. |
Soft-Filtered Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp).
The soft-filtered VCF includes all variants and annotations called by the GATK pipeline.
The QC status of each variant (INFO field= The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
All Strains
WI.20250625.soft-filter.vcf.gz WI.20250625.soft-filter.vcf.gz.tbi Isotypes WI.20250625.soft-filter.isotype.vcf.gz WI.20250625.soft-filter.isotype.vcf.gz.tbi |
Hard-Filtered Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp).
The hard-filtered VCF includes only high-quality variants after all variants and genotypes with a failed QC status are removed.
To obtain vcf for a single or a subset of strains, use The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
All Strains
WI.20250625.hard-filter.vcf.gz WI.20250625.hard-filter.vcf.gz.tbi Isotypes WI.20250625.hard-filter.isotype.vcf.gz WI.20250625.hard-filter.isotype.vcf.gz.tbi |
Annotated Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp). The annotated VCFs include all the variants from the hard-filtered Isotype VCF and have been annotated using 4 different tools: ANNOVAR, CSQ, SnpEff, and VEP. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
ANNOVAR
WI.20250625.annovar.isotype.vcf.gz WI.20250625.annovar.isotype.vcf.gz.tbi CSQ WI.20250625.csq.isotype.vcf.gz WI.20250625.csq.isotype.vcf.gz.tbi SnpEff WI.20250625.snpeff.isotype.vcf.gz WI.20250625.snpeff.isotype.vcf.gz.tbi VEP WI.20250625.vep.isotype.vcf.gz WI.20250625.vep.isotype.vcf.gz.tbi |
Imputed Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp). The imputed VCF includes all the variants from the hard-filtered Isotype VCF, but all missing genotypes have been imputed using Beagle v5.1. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
Imputed
WI.20250625.impute.isotype.vcf.gz WI.20250625.impute.isotype.vcf.gz.tbi |
Reference Genome FASTA (WS283) | The reference genome build from Wormbase used for alignment and annotation. | 20250625_c_elegans_WS283.genome.fa |
Gene models | Gene models were constructed using a combination of BRAKER (short-read) and StringTie + TransDecoder (long-read) followed by QC with AGAT using the reference genome CGC1. |
canonical_geneset.gtf.gz
annotations.gff3.gz current.geneIDs.txt.gz |
Transposon Calls | We have performed transposon calling for C. elegans isotype reference strains as a part of Laricchia et al.. For C. briggsae and C. tropicalis, these data will be deposited as soon as they are generated. | 20250625_c_elegans_transposon_calls.bed |
Genetic Map | A genetic map generated from a cross between N2 and CB4856 (Rockman & Kruglyak, 2009). | c_elegans_genetic_map.tsv |
Tree | Tree generated using neighbour-joining algorithm as implemented in QuickTree in Newick and PDF format. |
All Strains
WI.20250625.hard-filter.min4.tree WI.20250625.hard-filter.min4.tree.pdf Isotype WI.20250625.hard-filter.isotype.min4.tree WI.20250625.hard-filter.isotype.min4.tree.pdf |
Haplotypes | Haplotypes for isotypes were calculated and plotted as described in Lee et al. |
20250625_c_elegans_haplotype.png 20250625_c_elegans_haplotype.pdf |
Sweep Haplotypes | The most frequent haplotype that covers at least 30% of the chromosome and is found on chromosome centers was determined and classified as a selective sweep. For more details of C. elegans selective sweeps, see Andersen et al. and Lee et al.. The plot shows red (swept), gray (non-swept), and white (not classified) regions. |
20250625_c_elegans_sweep.pdf 20250625_c_elegans_sweep_summary.tsv |
Hyper-Divergent Regions | The hyper-divergent regions are characterized by higher-than-average density of small variants and large genomic spans where short sequence reads fail to align to the reference genome. For more information, see the FAQ. |
20250625_c_elegans_divergent_regions_strain.bed
20250625_c_elegans_divergent_regions_strain.bed.gz |
Phenotype Trait Files | This gene expression trait was measured using short-read Illumina RNA-seq of mixed-stage populations of wild strains. Data are from Zhang et al. Nature Communications 2022. | 20231103_ZhangGeneExpression.tsv |
Download BAMs Script | You can batch download individual strain BAMs using this script. | 20250625_c_elegans_bam_bai_download.sh |
Methods are not available at this time.