File: report.html

  • Updated

The Tapestri Pipeline Run Report contains three tabs – Basics, Advanced, and Diagnostics. The below documentation explains each section, with the underlined text describing the metrics reported in each header. Some items only pertain to DNA + Protein runs.

NOTE: For multisample run report updates refer to click here and for demultiplexed sample report updates refer to this section.

 

Update: With V3 pipeline release the report contains certain items which are available only for V2 and others that are V3-specific. They have been marked accordingly in the description below.

Summary tab

Sample

Protein QC metrics

Sequencing (Protein)

Antibody counting

Panel uniformity plot

DNA QC metrics

Sequencing (DNA)

Mapping

Cell calling

Variant calling

Advanced tab

Panel performance

Low-performing amplicons - Only V2

Amplicon Performace Table - v3 and v3.4

Amplicons with >2* standard deviation Gini score

Amplicons with R1R2 imbalance

ADO calculation summary

Amplicon Gini score distribution plot

Diagnostics tab

R1R2 imbalance plot - Only V2

Balance R1R2 barplot - Only V2

Cellfinder UMAP plot - v3 and v3.4

Cellfinder correlation coverage plot - v3 and v3.4

Reads log-log plot 

DNA 1x vs 10x coverage plot

DNA vs Protein reads scatter plot (protein only) - Only V2

Antibody cell distribution plot (protein only)

Antibody read distribution plot (protein only)

Warnings

Multisample Run Report

Demultiplexed Sample Report

 

Summary tab

Sample

Run ID

Analyte

DNA panel name

DNA panel size

Reference genome

Chemistry version - Only V2

DNA pipeline version

Protein panel name (protein only)

Protein panel size (protein only)

Protein pipeline version(protein only)

Date analyzed

Protein QC metrics

Read quality (Q30)

This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30. A warning is seen if the Q30% is less than 85%.

GC Content

This is the percentage of G and C bases in both R1 and R2 reads. A warning is seen if the GC% is less than 32%.

MaxN per read position

This is the percentage of reads with N at a particular position. It is calculated for R1 and R2 separately and a warning is seen if more than 5% reads have N at the same position..

Barcode contant 1 read rate

This is the percentage of R1 reads with the constant 1 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.

Barcode contant 2 read rate

This is the percentage of R1 reads with the constant 2 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.

Sequencing (Protein)

# total read pairs

This is the number of total read pairs in the fastq file for protein.

# total barcodes

This is the number of total barcodes that were found by the protein pipeline, i.e., the number of barcodes with more than 1 read.

% read pairs after candidate barcode filtering

This is the percentage of read pairs that are present in the candidate barcodes.

Read quality (Q30) - Only V2

This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30.

% read pairs trimmed - Only V2

This is the percentage of read pairs that were trimmed at the Cutadapt step (denominator: total read pairs).

% read pairs after cell barcode processing - Only V2

This is the percentage of read pairs that passed the cell barcode extraction and detection step, i.e., they had a valid cell barcode structure (denominator: total read pairs).

% read pairs after antibody barcode processing - Only V2

This is the percentage of read pairs that passed the antibody barcode detection step, i.e., they had a valid antibody barcode structure in them.

# candidate barcodes - Only V2

This is the number of barcodes with more than 10 reads.

% read pairs after candidate barcode filtering

This is the percentage of read pairs that are present in the candidate barcodes.

Antibody counting

Mean reads/cell/antibody

This is the mean reads per DNA called cell divided by the number of antibodies. This is always the floor of the decimal obtained by division.

# antibodies detected

This is the number of antibodies that have at least 1 read in at least 1 cell.

Median antibodies/cell

This is the median of the number of antibodies that have at least 1 read in a cell.

Panel uniformity plot

This is a boxplot where each box corresponds to an amplicon. The values for each box are the read percentages for that amplicon in all cells. The amplicons are sorted by their mean reads, with the highest at the top and lowest at the bottom. This plot shows how the performance of an amplicon varies throughout the called cells.

DNA QC metrics

Read quality (Q30)

This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30. A warning is seen if the Q30% is less than 85%.

GC Content

This is the percentage of G and C bases in both R1 and R2 reads. A warning is seen if the GC% is less than 32%.

MaxN per read position

This is the percentage of reads with N at a particular position. It is calculated for R1 and R2 separately and a warning is seen if more than 5% reads have N at the same position..

Barcode contant 1 read rate

This is the percentage of R1 reads with the constant 1 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.

Barcode contant 2 read rate

This is the percentage of R1 reads with the constant 2 part of the barcode. This region is used to extract the barcodes from the reads and if the region is missing then the reads lack the barcodes. A warning is seen if <50% reads have this region.

Sequencing (DNA)

# total read pairs

This is the number of total read pairs in the input fastq file. For multiple lanes, this would be the sum of all lanes.

Read quality (Q30) - Only V2

This is the percentage of bases (in both R1 and R2 reads) that have a sequencing quality higher than 30.

% read pairs trimmed

This is the percentage of read pairs that were trimmed at the Cutadapt step (denominator: total read pairs).

% read pairs with valid barcodes

This is the percentage of read pairs that passed the collapse barcode step of the DNA pipeline, i.e., they had a valid barcode structure.

# total barcodes

This is the total number of barcodes identified in this sample, i.e., the number of barcodes that have 1 or more reads.

Mapping

% reads mapped to genome

This is the percentage of read pairs that successfully mapped to the genome.

% reads mapped to target

This is the percentage of read pairs that mapped to the insert coordinates of the amplicons in this panel. The mapped-to-insert coordinate is defined by having a greater than 1 base pair overlap between the read and insert coordinates.

Cell calling

# cells

This is the number of cells called by the cellfinder module.

Panel uniformity

This is the number of amplicons that have mean reads to the amplicon above 0.2 * the mean reads per amplicon per cell.

Mean reads/cell

This is the mean reads per called cell. It is the floor of the decimal obtained by dividing the total reads to cells by the number of cells.

Mean reads/cell/amplicon

This is the floor of the mean reads per cell divided by the number of amplicons in the panel.

% DNA read pairs assigned to cells

This is the percentage of read pairs that are present in the called cells.

DNA data completeness

This is defined as the percentage of amplicon per cell combinations that have more than 10 reads, i.e., they have enough sequencing depth for variant calling.

% antibody read pairs assigned to cells (protein only)

This is the percentage of protein read pairs that are assigned to DNA called cells (denominator: total read pairs in the fastq file).

Algorithm - v3 and v3.4

This is the method that was used to call the cells. The V3 pipeline uses the new Correlation UMAP method to call cells but in case that is unsuccessful then it falls back to the Completeness method.

Variant calling

# filtered variants

This is the number of variants that would be seen by loading the h5 file in Tapestri Insights and applying the default Insights filters.

ADO rate

This is the ADO rate for this run calculated using germline variants. 

NOTE: ADO is not calculated for demultiplexed tapestri runs and the metrics is shown as NA

Advanced tab

Panel performance

% amplicons between 0.2*mean and 5*mean reads/cell/amplicon

This is the percentage of amplicons that lie in this range (denominator: panel size).

% amplicons between 0.5*mean and 2*mean reads/cell/amplicon

This is the percentage of amplicons that lie in this range (denominator: panel size).

% amplicons > 1x coverage - Only V2

This is the percentage of amplicons that have mean reads over 1 (denominator: panel size).

% amplicons > 5x coverage - Only V2

This is the percentage of amplicons that have mean reads over 5 (denominator: panel size).

% amplicons > 10x coverage

This is the percentage of amplicons that have mean reads over 10 (denominator: panel size).

% amplicons > 20x coverage - Only V2

This is the percentage of amplicons that have mean reads over 20 (denominator: panel size).

% amplicons > 40x coverage - Only V2

This is the percentage of amplicons that have mean reads over 40 (denominator: panel size).

% reads to amplicons above 2*mean reads/cell/amplicon

This is the percentage of reads that belong to amplicons with mean reads over 2 * the mean reads per cell per amplicon (denominator: total reads to cells).

Low-performing amplicons - Only V2

This table shows amplicon names and mean reads per amplicon. This table is created for amplicons that have mean reads below 0.2 * the mean reads per cell per amplicon.

Amplicon Performance Table - v3 and v3.4

This table lists each amplicon and the number of mean reads per cell. For each amplicon, it also lists whether the amplicon passed the threshold. The threshold calculation is the mean of the total mean reads, e.g., the mean of column Mean Reads * 0.2. If the mean reads for an amplicon is above this number, it passed the threshold and has a value of TRUE. If it is below this number, the value is FALSE. The table is sorted in mean reads with low performing amplicons at the top.

Amplicons with > 2* standard deviation Gini score

This is a list of amplicons that have a high Gini score. A high Gini score is based on the distribution of the Gini scores in this run and is defined as the mean Gini score of all amplicons + (2 * the standard deviation of Gini scores of all amplicons).

Amplicons with R1R2 imbalance

This is a list of amplicons that have an imbalance in their R1 and R2 reads. An amplicon is said to have an R1R2 imbalance if the fold change of R1 to R2 (or R2 to R1) is greater than 2.

ADO calculation summary

This table shows the germline variants that were used to calculate the ADO rate for this run.

NOTE: ADO is not calculated for demultiplexed tapestri runs and table shows the message "No variants found for ADO computation."

Amplicon Gini score distribution plot

This distribution plot shows the distribution of the Gini scores of all amplicons in this run.

Diagnostics tab

R1R2 imbalance plot - Only V2

This plot shows the R1 and R2 read fractions for the amplicons that have an R1R2 imbalance. When there are no such amplicons, this plot is empty.

Balance R1R2 barplot - Only V2

This plot shows the R1 and R2 read fractions for the amplicons that do not have an R1R2 imbalance. This plot combined with the R1R2 imbalance plot should cover all the amplicons in the panel.

Cellfinder UMAP plot - v3 and v3.4

This plot shows the UMAP distribution for all barcodes and plots the 2 dimensions - UMAP-x vs UMAP-y. The clusters seen on the plot are colored by the cell finder assignment as valid-cells, invalid-barcodes and other. Ideally, each cluster should be made up of the same type of barcode. 

Cellfinder correlation coverage plot - v3 and v3.4

This is the correlation(R2) vs coverage(log10(reads/amplicon)) plot that shows the cell and the invalid-barcode population. Ideally, the non cell barcodes should form the top dense band with the valid-single cells forming a second less dense cluster under the non cell barcodes.

Reads log-log plot

This is the total reads vs rank-ordered barcodes log-log plot, which is also a part of the cellfinder output. Previously, a vertical line displayed in the log-log plot, but it is now obsolete because the cellfinder no longer works on a threshold. Instead, it selects cells based on complete cells instead of all cells above a certain read threshold. Its purpose is as a diagnostic tool primarily used to validate that the shape of the knee looks normal.

DNA 1x vs 10x coverage plot

This is a scatter plot where each dot is a barcode. The x value is a fraction of the amplicons that have more than 1 read in that barcode. The y value is the fraction of amplicons that have more than 10 reads in that barcode. The dots are colored based on if they are called a cell by cellfinder. This plot can be used to identify the runs with high amounts of partial cells. In such cases, the distribution of the barcodes would be closer to the unity line instead of being closer to the axes.

DNA vs Protein reads scatter plot (protein only) - Only V2

This scatter plot for all barcodes shows the number of DNA reads that the barcode has on the x-axis and the number of protein reads that the barcode has on the y-axis.

Antibody cell distribution plot (protein only)

This bar graph shows the number of cells where an antibody has non-zero reads. Each bar represents an antibody. Note, labels are turned off by default in the report on the RUN DETAILS page. For label information please download the interactive_report.html file in the OUTPUT FILES page.

Antibody read distribution plot (protein only)

This violin plot shows the distribution of reads that an antibody has across all cells. Each curve represents an antibody. Note, labels are turned off by default in the report on the RUN DETAILS page. For label information please download the interactive_report.html file in the OUTPUT FILES page.

Warnings

QC metrics

A warning sign is seen next to the QC metrics that do not pass a certain threshold as given below:

Read quality (Q30) A warning is seen if the Q30% is less than 85%.

GC Content – A warning is seen if the GC% is less than 32%.

MaxN per read position – A warning is seen  if more than 5% reads have N at the same position..

Barcode contant 1 read rate A warning is seen if <50% reads have this region.

Barcode contant 2 read rate A warning is seen if <50% reads have this region.

Mapping metrics

% reads mapped to genome – A warning is seen if value is below 70%.

Cell calling metrics

Mean reads/cell/amplicon – A warning is seen if value is below 60%.

Mean reads/cell/amplicon  A warning is seen if value is below 20 or above 200.

Mean reads/cells/antibody – A warning is seen if value is below 75.

Panel Performance metrics

% reads to amplicons above 2*mean reads/cell/amplicon – A warning is seen if value is above 50%.

Multisample Run Report

For multisample Tapestri run there are multiple report files - 

  • <run-prefix>.report.html - One html file for the full multisample run
  • <run-prefix>.<sample>.report.html - One html file for each sample

The multisample Tapestri run report has the following additional elements:

Summary tab

Demultiplexing

Method

Demultiplexing method used - Genotype based or Antibody based

Number of samples found

Number of samples identified in the tapestri run during demultiplexing

Cells assigned to a sample

Percentage of cells from the tapestri run that could be successfully assigned to samples

Variants used

Variants from the list of user provided germline variants that were used for demultiplexing

Variants discarded

Variants from the list of user provided germline variants that were not used for demultiplexing. A variant can be discarded due to multiple reasons like poor quality, not a SNP, correlated variant etc.

 

Advanced tab

Cells by sample plot

This bar graph show the value counts for each label which include the demultiplexed sample and the unassigned, mixed, ambiguous and low read cells.

Diagnostic tab

Discarded demultiplexing variants

This table lists the user provided germline variants which were not used for demultiplexing along with the reason.

Demultiplexed Sample Report

Sample reports are created for each sample that is demultiplexed from the multisample run. These are available for download from the Output Files tab and are named as <prefix>.<sample_ID>.dmx.report.html. These reports are similar to the run reports with the some metrics and plots recalculated using only the sample specific cells like the Cell calling metrics, Antibody Counting metrics, Amplicon Performance table etc.

The metrics that are calculated across all samples of a multisample tapestri run are identified by presence of an "*" on the card, for example the QC metrics, sequencing metrics etc.

 

Share this article:

Was this article helpful?

4 out of 7 found this helpful

Have more questions? Submit a request