How does the Genome Integrity algorithm work?

  • Updated

 

The algorithm first creates a model for the expected number of reads per copy using the diploid spike-in cell population. The genome integrity algorithm has two basic steps to identify CNV-defined clones:

  1. CNV-based clones are identified in the sample at a resolution of 10-30Mb by combining information across multiple amplicons. The clones are first identified based on the presence of loss of heterozygosity (LoH) on neighboring amplicons. Then, CNV is called for a group of amplicons for every cell, and subclones in each LoH clone are identified using a custom clustering process on the per-cell CNV calls. The algorithm leverages the allele frequencies of the SNVs on each amplicon to add confidence to the CNV calls. The amplicons target heterozygous SNVs (1 mutant allele, 1 WT allele), so the VAF for the mutant allele is 50% in the diploid state. If there is a copy gain, then there are either 2 mutant alleles (VAF=  66%) or two WT alleles (VAF=  33%). If there is a copy loss, then the VAF is expected to be 0% or 100%.
  2. CNV calls across all cells in each subclone are combined to calculate a higher accuracy CNV call for each amplicon at the subclone level. This is followed by an error correction step to obtain CNV calls at a higher resolution of 5-15Mb at the subclone level. The error correction is performed by comparing the copy number of every amplicon with its neighboring amplicons. Multiple factors are considered when estimating the error-corrected CNVs; these include the VAF of the heterozygous SNVs on an amplicon, the number of amplicons supporting a CNV event, and the relative position of an amplicon with respect to other amplicons on the chromosome arm.
Share this article:

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request