Release Notes
The 20250626 release includes genotypes from whole-genome sequences. Genotypes are compared for concordance, and strains that are 99.97% identical to each other are grouped into isotypes. One strain within each isotype is the reference strain for that isotype. To look up isotype assignment, see Isotype List. All isotype reference strains are available on CaeNDR.
- Strains: 2022
- WGS strains: 2022
- Isotypes: 719
- Genome: QX1410
- BioProject: PRJNA10731
Strain | Isotype | Old Isotype | Reason |
JU356 | JU356 | NIC67 | Too many differences from old isotype, moved to its own isotype |
NIC1442 | NIC1442 | NIC96 | Too many differences from old isotype, moved to its own isotype |
NIC1491 | NIC1491 | NIC96 | Too many differences from old isotype, moved to its own isotype |
Datasets
Dataset | Description | Download |
---|---|---|
Strain Data | Includes strain, isotype, location information, and more. | 20250626_c_briggsae_strain_data.csv |
Strain Issues | This link contains all strain issues for this release | |
Alignment Data |
Alignment data are stored as BAM files, which are binary representations of the Sequence Alignment/Map format. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
This link contains all alignment data as BAM or BAI files. |
Variant Data | Strain-level variant information is stored in the VCF and genomic VCF format. The gVCF format contains information for every base regardless of whether a variant is present or not and is suitable for compiling and joint calling variants across a custom strain set. These files were produced by GCTA. | This link contains all genomic variant data as VCF, TBI, or gVCF files. |
Soft-Filtered Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp).
The soft-filtered VCF includes all variants and annotations called by the GATK pipeline.
The QC status of each variant (INFO field= The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
All Strains
WI.20250626.soft-filter.vcf.gz WI.20250626.soft-filter.vcf.gz.tbi Isotypes WI.20250626.soft-filter.isotype.vcf.gz WI.20250626.soft-filter.isotype.vcf.gz.tbi |
Hard-Filtered Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp).
The hard-filtered VCF includes only high-quality variants after all variants and genotypes with a failed QC status are removed.
To obtain vcf for a single or a subset of strains, use The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
All Strains
WI.20250626.hard-filter.vcf.gz WI.20250626.hard-filter.vcf.gz.tbi Isotypes WI.20250626.hard-filter.isotype.vcf.gz WI.20250626.hard-filter.isotype.vcf.gz.tbi |
Annotated Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp). The annotated VCFs include all the variants from the hard-filtered Isotype VCF and have been annotated using 4 different tools: ANNOVAR, CSQ, SnpEff, and VEP. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
ANNOVAR
WI.20250626.annovar.isotype.vcf.gz WI.20250626.annovar.isotype.vcf.gz.tbi CSQ WI.20250626.csq.isotype.vcf.gz WI.20250626.csq.isotype.vcf.gz.tbi SnpEff WI.20250626.snpeff.isotype.vcf.gz WI.20250626.snpeff.isotype.vcf.gz.tbi VEP WI.20250626.vep.isotype.vcf.gz WI.20250626.vep.isotype.vcf.gz.tbi |
Imputed Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp). The imputed VCF includes all the variants from the hard-filtered Isotype VCF, but all missing genotypes have been imputed using Beagle v5.1. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
Imputed
WI.20250626.impute.isotype.vcf.gz WI.20250626.impute.isotype.vcf.gz.tbi |
Reference Genome FASTA (QX1410) | The reference genome build from Stevens, 2022 used for alignment and annotation. | 20250626_c_briggsae_Feb2020.genome.fa |
Gene models | Gene models were constructed using a combination of BRAKER (short-read) and StringTie + TransDecoder (long-read) followed by QC with AGAT using the reference genome QX1410. |
canonical_geneset.gtf.gz
annotations.gff3.gz current.geneIDs.txt.gz |
Genetic Map | A genetic map generated from a cross between QX1410 and VX34 (Stevens, 2022). | c_briggsae_genetic_map.tsv |
Tree | Tree generated using neighbour-joining algorithm as implemented in QuickTree in Newick and PDF format. |
All Strains
WI.20250626.hard-filter.min4.tree WI.20250626.hard-filter.min4.tree.pdf Isotype WI.20250626.hard-filter.isotype.min4.tree WI.20250626.hard-filter.isotype.min4.tree.pdf |
Haplotypes | Haplotypes for isotypes were calculated and plotted as described in Lee et al. |
20250626_c_briggsae_haplotype.png 20250626_c_briggsae_haplotype.pdf |
Download BAMs Script | You can batch download individual strain BAMs using this script. | 20250626_c_briggsae_bam_bai_download.sh |
Methods are not available at this time.