Release Notes
The 20250627 release includes genotypes from whole-genome sequences. Genotypes are compared for concordance, and strains that are 99.9915% identical to each other are grouped into isotypes. One strain within each isotype is the reference strain for that isotype. To look up isotype assignment, see Isotype List. All isotype reference strains are available on CaeNDR.
- Strains: 785
- WGS strains: 785
- Isotypes: 622
- Genome: NIC58
- BioProject: PRJNA53597
Strain | Isotype | Previous Isotype | Reason |
ECA1309 | ECA1309 | ECA1306 | Split 2-member isotype due to too much dissimilarity |
NIC1132 | NIC1132 | NIC1161 | Split 2-member isotype due to too much dissimilarity |
NIC916 | NIC916 | NIC1425 | Split 2-member isotype due to too much dissimilarity |
NIC1544 | NIC1544 | NIC1550 | Split 2-member isotype due to too much dissimilarity |
NIC1584 | NIC1584 | NIC1583 | Split 3-member isotype due to too much dissimilarity |
NIC1585 | NIC1585 | NIC1583 | Split 3-member isotype due to too much dissimilarity |
NIC461 | NIC461 | NIC454 | Split 2-member isotype due to too much dissimilarity |
NIC572 | NIC572 | NIC571 | Too many differences from old isotype, moved to its own isotype |
NIC977 | NIC977 | NIC975 | Too many differences from old isotype, moved to its own isotype |
NIC988 | NIC988 | NIC987 | Split 2-member isotype due to too much dissimilarity |
QG921 | QG921 | QG1004 | Split 2-member isotype due to too much dissimilarity |
QG3535 | QG3535 | QG3839 | Split 2-member isotype due to too much dissimilarity |
QG3812 | QG3812 | QG3873 | Split 2-member isotype due to too much dissimilarity |
Datasets
Dataset | Description | Download |
---|---|---|
Strain Data | Includes strain, isotype, location information, and more. | 20250627_c_tropicalis_strain_data.csv |
Strain Issues | This link contains all strain issues for this release | |
Alignment Data |
Alignment data are stored as BAM files, which are binary representations of the Sequence Alignment/Map format. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
This link contains all alignment data as BAM or BAI files. |
Variant Data | Strain-level variant information is stored in the VCF and genomic VCF format. The gVCF format contains information for every base regardless of whether a variant is present or not and is suitable for compiling and joint calling variants across a custom strain set. These files were produced by GCTA. | This link contains all genomic variant data as VCF, TBI, or gVCF files. |
Soft-Filtered Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp).
The soft-filtered VCF includes all variants and annotations called by the GATK pipeline.
The QC status of each variant (INFO field= The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
All Strains
WI.20250627.soft-filter.vcf.gz WI.20250627.soft-filter.vcf.gz.tbi Isotypes WI.20250627.soft-filter.isotype.vcf.gz WI.20250627.soft-filter.isotype.vcf.gz.tbi |
Hard-Filtered Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp).
The hard-filtered VCF includes only high-quality variants after all variants and genotypes with a failed QC status are removed.
To obtain vcf for a single or a subset of strains, use The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
All Strains
WI.20250627.hard-filter.vcf.gz WI.20250627.hard-filter.vcf.gz.tbi Isotypes WI.20250627.hard-filter.isotype.vcf.gz WI.20250627.hard-filter.isotype.vcf.gz.tbi |
Imputed Variants |
Variant information is stored in the VCF format, which is a tab delimited format for storing variant calls and individual genotypes. It is able to store all variant calls from single nucleotide variants to insertions and deletions (~50 bp). The imputed VCF includes all the variants from the hard-filtered Isotype VCF, but all missing genotypes have been imputed using Beagle v5.1. The specifications for these file formats continue to develop. Current specifications for BAM and VCF can be found at hts-specs. |
Imputed
WI.20250627.impute.isotype.vcf.gz WI.20250627.impute.isotype.vcf.gz.tbi |
Reference Genome FASTA (NIC58) | The reference genome build from Noble, 2021 used for alignment and annotation. | 20250627_c_tropicalis_June2021.genome.fa |
Gene models | Gene models were constructed using a combination of BRAKER (short-read) and StringTie + TransDecoder (long-read) followed by QC with AGAT and manual curation with Apollo using the reference genome NIC58. |
canonical_geneset.gtf.gz
annotations.gff3.gz current.geneIDs.txt.gz |
Genetic Map | A genetic map generated from a cross between NIC58 and JU1373 (Noble, 2021). | c_tropicalis_genetic_map.tsv |
Tree | Tree generated using neighbour-joining algorithm as implemented in QuickTree in Newick and PDF format. |
All Strains
WI.20250627.hard-filter.min4.tree WI.20250627.hard-filter.min4.tree.pdf Isotype WI.20250627.hard-filter.isotype.min4.tree WI.20250627.hard-filter.isotype.min4.tree.pdf |
Haplotypes | Haplotypes for isotypes were calculated and plotted as described in Lee et al. |
20250627_c_tropicalis_haplotype.png 20250627_c_tropicalis_haplotype.pdf |
Hyper-Divergent Regions | The hyper-divergent regions are characterized by higher-than-average density of small variants and large genomic spans where short sequence reads fail to align to the reference genome. For more information, see the FAQ. |
20250627_c_tropicalis_divergent_regions_strain.bed
20250627_c_tropicalis_divergent_regions_strain.bed.gz |
Download BAMs Script | You can batch download individual strain BAMs using this script. | 20250627_c_tropicalis_bam_bai_download.sh |
Methods are not available at this time.