Pipeline v3
The cells are genotyped using the DRIVER binary from Sentieon® tools. DRIVER binary uses identical mathematics as Broad Institute’s BWA-GATK Best Practice Workflow, but is 20X faster BAM-to-VCF, measured in core-hours.
The cells are genotyped using the Haplotyper algorithm followed by the GVCFTyper algorithm for joint calling. A difference from the previous GATK implementation is that the cells are haplotyped in gvcf mode to emit a summarized confidence estimate for a site as being strictly homozygous (reference). The per-bp resolution is used while merging the genomic-VCFs (gVCFs) for all cells using Sentieon’s GVCFTyper algorithm. Loci found to be non-variant are maintained in the final output.
Genotyping parameters are optimized for high sensitivity:
- A maximum of 2 alternate alleles are reported for each site,
- The minimum base quality for variant calling is set to 10, and
- The heterozygosity value is set at 0.001.
Pipeline v2
The cells are genotyped using the Genome Analysis Toolkit (McKenna, Hanna et al, 2010) with a joint calling approach that follows GATK Best Practices recommendations (DePristo, Banks et al. 2011; Van der Auwera, Carneiro et al, 2013).
Each cell is haplotyped in reference confidence mode to enable per-base pair (bp) confidence estimates for a site as being strictly homozygous (reference). The per-bp resolution is maintained while merging the genomic-VCFs (gVCFs) for all cells using GATK’s CombineGVCFs tool. Finally, joint genotyping is performed for all cells using GATK’s GenotypeGVCFs tool. Loci found to be non-variant are maintained in the final output.
Genotyping parameters are optimized for high sensitivity:
- A maximum of 2 alternate alleles are reported for each site,
- The minimum base quality for variant calling is set to 10, and
- The heterozygosity value is set at 0.001.