Acute Myeloid Leukemia (AML)
AML is an inventoried catalog panel that targets the most commonly mutated genes associated with acute myeloid leukemia.
Allele
An allele is a version of a gene that occurs in the same place on a chromosome (locus) and is responsible for variations of a trait. Most human genes consist of two alleles, one from each parent. If the alleles are the same, a cell is said to be homozygous (HOM). If the alleles are different, a cell is said to be heterozygous (HET).
Allele dropout (ADO)
An allele dropout occurs when a cell is genotyped, and one or both alleles are not present. Tapestri’s ADO rate is < 10 %.
Allele frequency (AF)
Allele frequency is the relative frequency of an allele at a genetic locus across a population.
Alternate allele (non-REF, ALT)
An alternate allele is any base at a locus that is not the reference allele. Non-REF and ALT are interchangeable terms.
Alternate allele (ALT) reads
ALT reads are the reads in a cell that are genotyped with a non-reference base call for a variant. This is also referred to as non-REF reads.
Amplicon
An amplicon is a piece of DNA or RNA used for amplification to cover targets. Tapestri Designer currently uses DNA only. The goal is for a panel to have 100 % coverage for all the identified targets. In some cases, one amplicon is enough to cover a single target, but a target may require more. When multiple targets are in play, the number of amplicons increases.
Amplicon read completeness
Amplicon read completeness is used to call cells from barcodes. We use a metric called panel performance threshold, defined in this article, to calculate the limit for working/good amplicons. The amplicons that pass the panel performance threshold are considered complete or having sufficient read coverage to be used for cell calling.
Barcode
A barcode is a short code of gene DNA used to identify species. Tapestri DNA Pipeline barcodes are 1536 x 1536 9 bp barcodes that are attached to beads. These barcodes are extracted from the mapped reads. For more information about barcode correction, please read this article.
Base pair (bp)
A base pair is two bases held together by hydrogen bonds. For DNA, cytosine (C) forms a pair with guanine (G), and adenine (A) forms a pair with thymine (T). For RNA, cytosine (C) forms a pair with guanine (G), and adenine (A) forms a pair with uracil (U).
Blacklisted variants
Blacklisted variants are ones that have been determined to be false positives, i.e., systematic error variants. These variants have been identified through a series of internal validation runs across different sample types.
Chronic Lymphocytic Leukemia (CLL)
CLL is an inventoried catalog panel that covers a combination of oncogenes and tumor suppressor genes that cover some of the most commonly mutated genes associated with chronic lymphocytic leukemia.
Copy number variation (CNV)
CNVs are a structural variant with alterations in the number of copies of regions of DNA, either duplicated or deleted. These may either be inherited or happen de novo. See this article for additional information.
Coverage metrics
Each panel and target lists a coverage percentage. The target coverage percentage is calculated by how much the specified target is covered by the amplicon. The panel coverage percentage is the overall coverage of all the targets.
CRISPR
CRISPR, clusters of regularly interspaced short palindromic repeats, regions have DNA distributed with repeated sequences of nucleotide interspersed with DNA spacers. CRISPR technology is a tool for editing genomes, providing a way to modify gene function and alter DNA sequences.
Data completeness
A cell needs to have sufficient read depth and coverage to generate meaningful data downstream. Mission Bio’s cell calling method first rejects cells with less than 10 reads and then identifies the cells that have data for at least 80 % of the good-performing amplicons, thereby reducing noise and incomplete cells.
De novo
A de novo mutation is one that shows up for the first time in a family because of a variant in the egg or sperm or fertilized egg.
DNA Pipeline
A DNA pipeline is a workflow that processes and analyzes DNA data from a sequencer, which includes performing quality checks, aligning DNA reads, cell identification, calling SNVs and indels, and generating data that can be used for tertiary analysis. After completing the process, the data can be visualized using Tapestri Insights.
Doublet rate
A doublet is two or more cells sequenced as one. The doublet rate correlates with the total number of cells to the bead ratio.
File formats
Datasets may contain the following files:
- .bam: Tapestri DNA Pipeline generates .bam files that include demultiplexed cell-specific sequence alignment data and can be optionally used to process the data outside of Tapestri DNA Pipeline.
- .barcode.cell.distribution.tsv: A tab-delimited file containing the barcode distribution across the amplicons generated by Tapestri DNA Pipeline.
- .bed: The .bed file must conform to a tab-delimited text file format. It defines a specific feature track, meaning each line should contain a chromosome, start location, and end location (one track) with one track per line. See this article for more formatting information.
- .loom: These Tapestri DNA Pipeline output files contain the omics dataset in the Loom file format. As of DNA Pipeline 1.10, they also contain barcodes. Import these files into Tapestri Insights for data visualization.
- .tsv: A tab-delimited file containing the antibody tag count across various cells generated by the protein pipeline.
- .vcf: Tapestri DNA Pipeline processes each tube individually in the Alignment to VCF error check and generates a Variant Call Format (.vcf) file, which follows the standard GATK format. This file is used to generate a .loom file.
Genome
A genome is the set of genetic information for an organism.
GenomeAnalysisTK (GATK)
GATK is a toolkit Tapestri DNA Pipeline uses to analyze high-throughput sequencing data. Its command-line tools are used for variant discovery, and the discovered variants are reported in Variant Call Format (VCF). DNA Pipeline v1.10 requires GATK version 3.7. It is pronounced by saying the individual letters – “gee ay tee kay” – instead of gat-kah or gat-kay.
Genotype
Noun: The genetic makeup of an organism.
Verb: The process of mapping the gene structure of an organism.
Haplotype
Haplotype is a set of alleles that are on the same chromosome and are inherited together from a single parent.
Heterozygous (HET)
Heterozygous cells are when the two alleles of a gene are different.
Homozygous (HOM)
Homozygous cells are when both alleles of a gene are the same. See HOM ALT and HOM REF for more information.
HOM ALT
If both the alleles are non-REF (ALT), the cell is classified as homozygous/alternate allele (HOM ALT). GATK haplotype caller is used to call the genotypes. Read more about how GATK calls REF/ALT allele and genotypes here.
HOM REF
If both the alleles are REF, it is classified as homozygous/reference (HOM REF). GATK HaplotypeCaller is used to call the genotypes. Read more details about how GATK calls REF/ALT allele and genotypes here.
Internal tandem duplications (ITDs)
ITDs frequently occur in the FLT3 gene locus. These duplications usually happen by duplicating nucleotides in sets of 3 and can vary in length from 3 – 100+.
Mission Bio’s Tapestri DNA Pipeline runs a custom algorithm in parallel to the standard GATK genotype caller to robustly call this type of variant. For more about why ITDs matter in leukemia, read this article.
Limit of detection (LOD)
The limit of detection is the lowest quantity or concentration of a component that can be reliably detected with a given analytical method. Tapestri’s LOD is 0.1 % of the population frequency.
Locus
The locus is the location of a gene or DNA sequence on a chromosome. The plural of locus is loci.
Multi-omics
Multi-omics, also referred to as multimodal omics, is a method that simultaneously analyzes datasets with multiple omics groups, such as genomics, transcriptomics, and proteomics. The Tapestri Platform provides analysis of single cells for SNVs, SNVs + CNVs, and SNVs + CNVs + Proteins. Multi-omics was voted the 2019 Method of the Year by Nature.
Myeloid
Myeloid is an inventoried catalog panel that covers a comprehensive set of myeloid disorders, including acute myeloid leukemia (AML), myelodysplastic syndrome (MDS), myeloproliferative neoplasms (MPN), chronic myeloid leukemia (CML), chronic myelomonocytic leukemia (CMML), and juvenile myelomonocytic leukemia (JMML).
No call
If a cell cannot be classified as HET, HOM ALT, or HOM REF, then it is reported as no call.
Nucleotide
Nucleotides are the structural units of nucleic acids like DNA and RNA. Each nucleotide consists of a nitrogenous base (DNA: A, G, T, or C; RNA: A, G, U, C), a five-carbon sugar molecule like deoxyribose or ribose, and a phosphate group, which contains one atom of phosphorus bound to four oxygen molecules.
Panel
A panel is a group of amplicons that allow you to amplify targets in a genome. A panel can consist of a single target or several targets.
Phenotype
The phenotype is the physical expression of an organism’s genotype.
Polymorphic
A gene is polymorphic if, for a population, more than one allele is at that gene’s locus. Typically, each allele must occur for at least 1 % of the population. For example, a highly polymorphic region is one that codes for major histocompatibility complex. There are more than 800 different alleles of human MHC class I and II genes, and it has been estimated that there are 200 variants for it.
Proteins
A protein is a polymer made up of amino acids that define the phenotype of an organism. They are relevant to multi-omics because the study of the cell surface protein expression in single-cell helps reveal the interplay between genotype and phenotype.
Quality Check (QC)
To obtain the highest quality results, the Tapestri Platform conducts multiple checks, including the validation of files and input data.
Reads
Reads are the output of a sequencing reaction. A read is a single uninterrupted series of nucleotides representing the sequence of the template.
Reference genome
A reference genome is a fully sequenced and mapped genome used for the mapping of sequence reads. Mission Bio uses human genome hg19 and mouse genome mm10. Please supply hg19 (GRCh37) or mm10 (GRCm38) coordinates for the chromosomal locations. hg38 (GRCh38) coordinates can be converted to hg19 using the UCSC liftover tool.
Tapestri Designer uses the reference genome to design primers and amplicons. Tapestri DNA Pipeline uses it to map reads, and Tapestri Insights uses the genome name to fetch annotations for hg19 samples. Insights currently does not support mm10 annotations, but mm10 samples can still be analyzed. The same reference genome must be used in each of these tools for the results to be accurate and meaningful.
Reference allele
The allele that is located on the plus strand of the genome sequence.
REF/non-REF
REF refers to the reference allele. non-REF refers to the alternate allele. Non-REFs are also referred to as ALT.
Reference confidence mode
Using the HaplotyopeCaller, the reference confidence calculation is how likely the allele is a homozygous reference (HOM REF).
Small subclone
Small subclones contain less than X % of the total number of cells in the variant.
Single-nucleotide polymorphisms (SNPs)
SNPs are genetic variations in the population. SNPs, unlike SNVs, imply that the variation occurs in at least 1 % of the population. Pronounced “snips.”
Single-nucleotide variants (SNVs)
SNVs are genetic variations in the population with no limits as to frequency.
Target
A target is a mutation in the DNA sequence that you are interested in. Targets can be a single-nucleotide variant (SNV) or a small region of base pairs on a chromosome.
Technical replicate
To increase coverage, a sample may be sequenced multiple times. This is called a technical replicate.
Tumor Hotspot (THP)
Tumor Hotspot is an inventoried catalog panel that targets hotspots across 59 oncogenes and tumor suppressor genes relevant in a range of different solid tumors with SNV and indel mutation detection.
V1 chemistry
Chemistry for a run is defined by the barcode structure. The V1 chemistry barcode structure details is described in this article. The V1 chemistry makes use of 8 bp barcodes, a 22 bp constant region, and an 8 bp bridge. V1 chemistry is not manufactured anymore and is obsolete.
V2 chemistry
V2 chemistry is defined by a barcode structure with 9 bp barcodes and 14 –17 bp constant regions. This article provides more details on the V2 barcode structure.
Variant
A variant is an alteration, or mutation, in the nucleotide sequence. Sequencing identifies variants by comparing a sample to a reference genome sequence.
Variant allele frequency (VAF)
VAF is the percentage of reads that matches a variant divided by the total coverage at the locus.
Wild type (WT)
The wild type is the typical form of the phenotype in the population.
Zygosity
Zygosity is how similar the alleles are of an organism for a particular gene.