The Tapestri Pipeline Run Report contains three tabs – Basics, Advanced, and Diagnostics. The below documentation explains each section, with the underlined text describing the metrics reported in each header. Some items only pertain to DNA + Protein runs.
NOTE: For multisample run report updates refer to click here and for demultiplexed sample report updates refer to this section. |
Update: With V3 pipeline release the report contains certain items which are available only for V2 and others that are V3-specific. They have been marked accordingly in the description below.
Summary tab
Advanced tab
Low-performing amplicons - Only V2
Amplicon Performace Table - v3 and v3.4
Amplicons with >2* standard deviation Gini score
Amplicon Gini score distribution plot
Diagnostics tab
R1R2 imbalance plot - Only V2
Balance R1R2 barplot - Only V2
Cellfinder UMAP plot - v3 and v3.4
Cellfinder correlation coverage plot - v3 and v3.4
DNA vs Protein reads scatter plot (protein only) - Only V2
Antibody cell distribution plot (protein only)
Antibody read distribution plot (protein only)
Summary tab
Sample
Run ID
Analyte
DNA panel name
DNA panel size
Reference genome
Chemistry version - Only V2
DNA pipeline version
Protein panel name (protein only)
Protein panel size (protein only)
Protein pipeline version(protein only)
Date analyzed
Protein QC metrics
Read quality (Q30)
This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30. A warning is seen if the Q30% is less than 85%.
GC Content
This is the percentage of G and C bases in both R1 and R2 reads. A warning is seen if the GC% is less than 32%.
MaxN per read position
This is the percentage of reads with N at a particular position. It is calculated for R1 and R2 separately and a warning is seen if more than 5% reads have N at the same position..
Barcode contant 1 read rate
This is the percentage of R1 reads with the constant 1 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.
Barcode contant 2 read rate
This is the percentage of R1 reads with the constant 2 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.
Sequencing (Protein)
# total read pairs
This is the number of total read pairs in the fastq file for protein.
# total barcodes
This is the number of total barcodes that were found by the protein pipeline, i.e., the number of barcodes with more than 1 read.
% read pairs after candidate barcode filtering
This is the percentage of read pairs that are present in the candidate barcodes.
Read quality (Q30) - Only V2
This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30.
% read pairs trimmed - Only V2
This is the percentage of read pairs that were trimmed at the Cutadapt step (denominator: total read pairs).
% read pairs after cell barcode processing - Only V2
This is the percentage of read pairs that passed the cell barcode extraction and detection step, i.e., they had a valid cell barcode structure (denominator: total read pairs).
% read pairs after antibody barcode processing - Only V2
This is the percentage of read pairs that passed the antibody barcode detection step, i.e., they had a valid antibody barcode structure in them.
# candidate barcodes - Only V2
This is the number of barcodes with more than 10 reads.
% read pairs after candidate barcode filtering
This is the percentage of read pairs that are present in the candidate barcodes.
Antibody counting
Mean reads/cell/antibody
This is the mean reads per DNA called cell divided by the number of antibodies. This is always the floor of the decimal obtained by division.
# antibodies detected
This is the number of antibodies that have at least 1 read in at least 1 cell.
Median antibodies/cell
This is the median of the number of antibodies that have at least 1 read in a cell.
Panel uniformity plot
This is a boxplot where each box corresponds to an amplicon. The values for each box are the read percentages for that amplicon in all cells. The amplicons are sorted by their mean reads, with the highest at the top and lowest at the bottom. This plot shows how the performance of an amplicon varies throughout the called cells.
DNA QC metrics
Read quality (Q30)
This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30. A warning is seen if the Q30% is less than 85%.
GC Content
This is the percentage of G and C bases in both R1 and R2 reads. A warning is seen if the GC% is less than 32%.
MaxN per read position
This is the percentage of reads with N at a particular position. It is calculated for R1 and R2 separately and a warning is seen if more than 5% reads have N at the same position..
Barcode contant 1 read rate
This is the percentage of R1 reads with the constant 1 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.
Barcode contant 2 read rate
This is the percentage of R1 reads with the constant 2 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.
Sequencing (DNA)
# total read pairs
This is the number of total read pairs in the input fastq file. For multiple lanes, this would be the sum of all lanes.
Read quality (Q30) - Only V2
This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30.
% read pairs trimmed
This is the percentage of read pairs that were trimmed at the Cutadapt step (denominator: total read pairs).
% read pairs with valid barcodes
This is the percentage of read pairs that passed the collapse barcode step of the DNA pipeline, i.e., they had a valid barcode structure.
# total barcodes
This is the total number of barcodes identified in this sample, i.e., the number of barcodes that have 1 or more reads.
Mapping
% reads mapped to genome
This is the percentage of read pairs that successfully mapped to the genome.
% reads mapped to target
This is the percentage of read pairs that mapped to the insert coordinates of the amplicons in this panel. The mapped-to-insert coordinate is defined by having a greater than 1 base pair overlap between the read and insert coordinates.
Cell calling
# cells
This is the number of cells called by the cellfinder module.
Panel uniformity
This is the number of amplicons that have mean reads to the amplicon above 0.2 * the mean reads per amplicon per cell.
Mean reads/cell
This is the mean reads per called cell. It is the floor of the decimal obtained by dividing the total reads to cells by the number of cells.
Mean reads/cell/amplicon
This is the floor of the mean reads per cell divided by the number of amplicons in the panel.
% DNA read pairs assigned to cells
This is the percentage of read pairs that are present in the called cells.
DNA data completeness
This is defined as the percentage of amplicon per cell combinations that have more than 10 reads, i.e., they have enough sequencing depth for variant calling.
% antibody read pairs assigned to cells (protein only)
This is the percentage of protein read pairs that are assigned to DNA called cells (denominator: total read pairs in the fastq file).
Algorithm - v3 and v3.4
This is the method that was used to call the cells. The V3 pipeline uses the new Correlation UMAP method to call cells but in case that is unsuccessful then it falls back to the Completeness method.
Variant calling
# filtered variants
This is the number of variants that would be seen by loading the h5 file in Tapestri Insights and applying the default Insights filters.
ADO rate
This is the ADO rate for this run calculated using germline variants.
NOTE: ADO is not calculated for demultiplexed tapestri runs and the metrics is shown as NA |
Advanced tab
Panel performance
% amplicons between 0.2*mean and 5*mean reads/cell/amplicon
This is the percentage of amplicons that lie in this range (denominator: panel size).
% amplicons between 0.5*mean and 2*mean reads/cell/amplicon
This is the percentage of amplicons that lie in this range (denominator: panel size).
% amplicons > 1x coverage - Only V2
This is the percentage of amplicons that have mean reads over 1 (denominator: panel size).
% amplicons > 5x coverage - Only V2
This is the percentage of amplicons that have mean reads over 5 (denominator: panel size).
% amplicons > 10x coverage
This is the percentage of amplicons that have mean reads over 10 (denominator: panel size).
% amplicons > 20x coverage - Only V2
This is the percentage of amplicons that have mean reads over 20 (denominator: panel size).
% amplicons > 40x coverage - Only V2
This is the percentage of amplicons that have mean reads over 40 (denominator: panel size).
% reads to amplicons above 2*mean reads/cell/amplicon
This is the percentage of reads that belong to amplicons with mean reads over 2 * the mean reads per cell per amplicon (denominator: total reads to cells).
Low-performing amplicons - Only V2
This table shows amplicon names and mean reads per amplicon. This table is created for amplicons that have mean reads below 0.2 * the mean reads per cell per amplicon.
Amplicon Performance Table - v3 and v3.4
This table lists each amplicon and the number of mean reads per cell. For each amplicon, it also lists whether the amplicon passed the threshold. The threshold calculation is the mean of the total mean reads, e.g., the mean of column Mean Reads * 0.2. If the mean reads for an amplicon is above this number, it passed the threshold and has a value of TRUE. If it is below this number, the value is FALSE. The table is sorted in mean reads with low performing amplicons at the top.
Amplicons with > 2* standard deviation Gini score
This is a list of amplicons that have a high Gini score. A high Gini score is based on the distribution of the Gini scores in this run and is defined as the mean Gini score of all amplicons + (2 * the standard deviation of Gini scores of all amplicons).
Amplicons with R1R2 imbalance
This is a list of amplicons that have an imbalance in their R1 and R2 reads. An amplicon is said to have an R1R2 imbalance if the fold change of R1 to R2 (or R2 to R1) is greater than 2.
ADO calculation summary
This table shows the germline variants that were used to calculate the ADO rate for this run.
NOTE: ADO is not calculated for demultiplexed tapestri runs and table shows the message "No variants found for ADO computation." |
Amplicon Gini score distribution plot
This distribution plot shows the distribution of the Gini scores of all amplicons in this run.
Diagnostics tab
R1R2 imbalance plot - Only V2
This plot shows the R1 and R2 read fractions for the amplicons that have an R1R2 imbalance. When there are no such amplicons, this plot is empty.
Balance R1R2 barplot - Only V2
This plot shows the R1 and R2 read fractions for the amplicons that do not have an R1R2 imbalance. This plot combined with the R1R2 imbalance plot should cover all the amplicons in the panel.
Cellfinder UMAP plot - v3 and v3.4
This plot shows the UMAP distribution for all barcodes and plots the 2 dimensions - UMAP-x vs UMAP-y. The clusters seen on the plot are colored by the cell finder assignment as valid-cells, invalid-barcodes and other. Ideally, each cluster should be made up of the same type of barcode.
Cellfinder correlation coverage plot - v3 and v3.4
This is the correlation(R2) vs coverage(log10(reads/amplicon)) plot that shows the cell and the invalid-barcode population. Ideally, the non cell barcodes should form the top dense band with the valid-single cells forming a second less dense cluster under the non cell barcodes.
Reads log-log plot
This is the total reads vs rank-ordered barcodes log-log plot, which is also a part of the cellfinder output. Previously, a vertical line displayed in the log-log plot, but it is now obsolete because the cellfinder no longer works on a threshold. Instead, it selects cells based on complete cells instead of all cells above a certain read threshold. Its purpose is as a diagnostic tool primarily used to validate that the shape of the knee looks normal.
DNA 1x vs 10x coverage plot
This is a scatter plot where each dot is a barcode. The x value is a fraction of the amplicons that have more than 1 read in that barcode. The y value is the fraction of amplicons that have more than 10 reads in that barcode. The dots are colored based on if they are called a cell by cellfinder. This plot can be used to identify the runs with high amounts of partial cells. In such cases, the distribution of the barcodes would be closer to the unity line instead of being closer to the axes.
DNA vs Protein reads scatter plot (protein only) - Only V2
This scatter plot for all barcodes shows the number of DNA reads that the barcode has on the x-axis and the number of protein reads that the barcode has on the y-axis.
Antibody cell distribution plot (protein only)
This bar graph shows the number of cells where an antibody has non-zero reads. Each bar represents an antibody. Note, labels are turned off by default in the report on the RUN DETAILS page. For label information please download the interactive_report.html file in the OUTPUT FILES page.
Antibody read distribution plot (protein only)
This violin plot shows the distribution of reads that an antibody has across all cells. Each curve represents an antibody. Note, labels are turned off by default in the report on the RUN DETAILS page. For label information please download the interactive_report.html file in the OUTPUT FILES page.
Warnings
QC metrics
A warning sign is seen next to the QC metrics that do not pass a certain threshold as given below:
Read quality (Q30) – A warning is seen if the Q30% is less than 85%.
GC Content – A warning is seen if the GC% is less than 32%.
MaxN per read position – A warning is seen if more than 5% reads have N at the same position..
Barcode contant 1 read rate – A warning is seen if <50% reads have this region.
Barcode contant 2 read rate – A warning is seen if <50% reads have this region.
Mapping metrics
% reads mapped to genome – A warning is seen if value is below 70%.
Cell calling metrics
Panel Uniformity – A warning is seen if value is below 60%.
Mean reads/cell/amplicon – A warning is seen if value is below 20 or above 200.
Mean reads/cells/antibody – A warning is seen if value is below 75.
Panel Performance metrics
% reads to amplicons above 2*mean reads/cell/amplicon – A warning is seen if value is above 50%.
Multisample Run Report
For multisample Tapestri run there are multiple report files -
- <run-prefix>.report.html - One html file for the full multisample run
- <run-prefix>.<sample>.report.html - One html file for each sample
The multisample Tapestri run report has the following additional elements:
Summary tab
Demultiplexing
Method
Demultiplexing method used - Genotype based or Antibody based
Number of samples found
Number of samples identified in the tapestri run during demultiplexing
Cells assigned to a sample
Percentage of cells from the tapestri run that could be successfully assigned to samples
Variants used
Variants from the list of user provided germline variants that were used for demultiplexing
Variants discarded
Variants from the list of user provided germline variants that were not used for demultiplexing. A variant can be discarded due to multiple reasons like poor quality, not a SNP, correlated variant etc.
Advanced tab
Cells by sample plot
This bar graph show the value counts for each label which include the demultiplexed sample and the unassigned, mixed, ambiguous and low read cells.
Diagnostic tab
Discarded demultiplexing variants
This table lists the user provided germline variants which were not used for demultiplexing along with the reason.
Demultiplexed Sample Report
Sample reports are created for each sample that is demultiplexed from the multisample run. These are available for download from the Output Files tab and are named as <prefix>.<sample_ID>.dmx.report.html. These reports are similar to the run reports with the some metrics and plots recalculated using only the sample specific cells like the Cell calling metrics, Antibody Counting metrics, Amplicon Performance table etc.
The metrics that are calculated across all samples of a multisample tapestri run are identified by presence of an "*" on the card, for example the QC metrics, sequencing metrics etc.