The following steps are taken to represent bi-allelic variants in our data structures.
After genotype calling by GATK, check if there are any locations with 2 alternate alleles called across all cells.
# Example where C1-C4 represent different cells; C1: T/T, C2: T/C, C3: C/C, C4: T/A
# REF ALT C1 C2 C3 C4 # T C,A 0/0 0/1 1/1 0/2
If a variant has 2 alternate alleles, a new line will be created for each alternate.
# REF ALT C1 C2 C3 C4 # T C 0/0 0/1 1/1 ./0 # T A 0/0 ./0 ./. 0/1
The INFO for the new genotype won't be split. The DP will be the same as the original value for both created variants.
Note that in Mosaic, ./0 will be interpreted as NGT = 0 (WT), ./1 as NGT = 2 (HET), ./. as NGT = 3 (Missing). In this example, cell 2 (genotype T/C) would show as NGT = 1 (HET) for the T>C variant and NGT = 0 (WT) for the T>A variant. Please contact support@missionbio.com for any questions regarding interpretation of multi allelic variants.
The following steps are taken to represent multi-allelic variants in our data structures.
After genotype calling by GATK, check if any cell has a 1/2 genotype, see Cell 5 in example below.
# Example where C1-C6 represent different cells; C1: T/T, C2: T/C, C3: C/C, C4: T/A, C5: C/A, C6: A/A
# REF ALT C1 C2 C3 C4 C5 C6 # T C,A 0/0 0/1 1/1 0/2 1/2 2/2
If a cell has a 1/2 genotype, a new line will be created with a new genotype.
# REF ALT C1 C2 C3 C4 C5 C6 # T C 0/0 0/1 1/1 ./0 ./1 ./. # T A 0/0 ./0 ./. 0/1 ./1 1/1 # * C+A ./. ./. ./. ./. 0/1 ./.
The INFO for the new genotype won't be split. The DP will be the same as the original value for all created variants. For multi-allelic variants the reference is listed as '*' as it is not present in these cells.