Clicking in the horizontal bar with the words Advanced Filtering brings up a new window to filter the data. The first three filters work on individual cells, and the next three work on all cells collectively. All six filter parameters can be adjusted. The filters are applied in the order listed.
This article contains a high-level overview of the advanced filters in Tapestri Insights. For a more in-depth discussion, read the attached file.
The Advance Filtering page displays six graphs:
- Remove genotype in cell with quality < X
- Remove genotype in cell with read depth < X
- Remove genotype in cell with alternate allele freq < X
- Remove variants genotyped in < X% of cell (low-quality variants filter)
- Remove cells with < X% of genotypes present (low-quality cells filter)
- Remove variants mutated in < X% of cells (low-frequency variants filter)
The first time this page displays, each of the filters is set to the recommended thresholds described below. Only a Recommended button displays in the horizontal line with the words Advanced Filtering, indicating that all filters are set at the recommended levels. After adjusting any of the parameters, Recommended and Custom buttons display (see screenshot above).
Customize a filter by entering a number or adjusting the numbers using the up and down arrows and clicking inside the graph area. An area with options displays to the right of Advanced Filtering.
Recommended: Reverts to the recommended settings. Applying this option returns all the thresholds to the default values.
Custom: Overrides the default values.
Cancel: Cancel the change.
Apply: Apply the change. Changes must be applied before taking effect.
These filters work on individual cells. The filtering criteria are applied to the individual cells, and the genotype for the cells is removed based on the given value.
1. Remove genotype in cell with quality < X
This filter removes the genotype from all the cells that have quality less than the given value.
The genotype quality score represents the Phred-scaled confidence that the genotype assignment (GT) is correct, derived from the genotype normalized likelihoods of possible genotypes (PL). Specifically, the quality score is the difference between the PL of the second most likely genotype and the PL of the most likely genotype. The values of the PLs are normalized so that the most likely PL is always 0. So, the quality score ends up being equal to the second smallest PL, unless that PL is greater than 99. In GATK, the value of the quality score is capped at 99 because larger values are not more informative, but they take up more space in the file. So if the second most likely PL is greater than 99, we still assign a quality score of 99.
Basically, the genotype quality score gives you the difference between the likelihood of the two most likely genotypes. If it is low, there is not much confidence in the genotype, i.e., there was not enough evidence to confidently choose one genotype over another.
Default: We recommend removing genotypes with a quality less than 30.
2. Remove genotype in cell with read depth < X
This filter removes the genotype from all the cells that have read depth less than the given value.
The read depth per variant metric is the filtered depth at the cell level. This shows the number of filtered reads that support each of the reported alleles.
Default: We recommend removing genotypes with read depths of less than 10.
3. Remove genotype in cell with alternate allele freq < X
This filter removes the genotype from all the cells that have alternate allele frequency percentages less than the given value.
The alternate allele frequency is the unfiltered allele depth, i.e., the number of reads that support each of the reported alleles. All reads at the position, including reads that did not pass the variant caller’s filters, are included in this number. Reads that were considered uninformative are not included. Reads are considered uninformative when they do not provide enough statistical evidence to support one allele over another. Only non-reference genotype calls are included.
Default: We recommend removing genotypes with alternate allele frequencies lower than 20.
These filters work at a global level, removing cells and variants from the current analysis. The number of cells and variants that are discarded by these 3 filters are shown in the discarded data section on the Load Samples tab.
4. Remove variants genotyped in < X% of cell
This metric is the proportion of cells that have genotype information available. For example, a threshold of 80 % retains only variants for which information is available in at least 80% of all cells. Variants with information in fewer than 80 % of cells are removed.
Default: We recommend removing variants genotyped in less than 50% of the cells.
5. Remove cells with < X% of genotypes present
This metric is the proportion of genotype information available on a per-cell basis. For example, a threshold of 80 % retains only the cells with at least 80 % of genotype information available for any given variant.
Default: We recommend removing cells with less than 50 % of the genotypes present.
6. Remove variants mutated in < X% of cells
This metric is the percentage of cells across all cells with a genotype call as a non-reference genotype, e.g., heterozygous or homozygous alternate. Any variant with cells genotyped as HET or HOM ALT in less than the given threshold value would be discarded from the analysis.
The select threshold is dependent on the sample type and the scientific question one intends to address. For example, rare subclone detection requires a lower threshold.
Default: We recommend removing variants mutated in less than 1 % of the cells.