Mission Bio's Clonal Insights Software(CIS) allows customers to process single-cell DNA or DNA + Protein sequencing data generated on the Tapestri Platform.
Setting up Tapestri Pipeline Account
Single Cell Clonal Insights Software
Time course or Multi-sample Analysis
Clonal Insights Software Inputs
CIS DNA/DNA+P single sample analysis
CIS DNA/DNA+P multiplexed sample analysis
Setting up Tapestri Pipeline Account
Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.
Single Cell Clonal Insights Software
Mission Bio’s Clonal Insights analysis pipeline is a complete end-to-end solution for the detection of rare leukemic cells which persist following treatment and can help predict disease relapse. FASTQ files generated from Tapestri Clonal Insights sequencing libraries are provided as input, and the pipeline generates reports of somatic mutations, clonal architecture, and protein expression profiles. The pipeline is compatible with either a single sample or multiple samples multiplexed together that are distinguishable via their germline genotype information (which must also be provided).
There are two types of Clonal Insights analysis:
- Single Sample analysis
- Time course or Multi-sample analysis
Single Sample Analysis
Clonal Insights Single Sample analysis requires FASTQ files from a single Tapestri run (either from one sample, or from up to three multiplexed samples) and it generates a report with the detected somatic variants/clones and the clonal architecture for each sample contained in the run. Each report represents a single time point from a single sample.
Time course or Multi-sample Analysis
Clonal Insights Time course analysis combines 2-5 single sample runs to generate a consolidated report summarizing the change in variant frequencies and clonal architecture over time.
Clonal Insights Software Inputs
Clonal Insights Software has the following inputs:
FASTQ files
Input FASTQ files are one or more pairs of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are required for the Clonal Insights DNA Pipeline, and DNA and Protein FASTQs are required for the Clonal Insights DNA+Protein Pipeline.
Panel files
DNA panel files are required by the Clonal Insights DNA Pipeline, DNA and Protein panel files are required by the Clonal Insights DNA+Protein Pipeline, and DNA and RNA panel files are required by the Clonal Insights DNA+RNA Pipeline.
DNA Panel
The DNA panel is a .zip file consisting of up to three files -
- [required] *.bed
- [required] *.amplicons
- [optional] *.per-variant-background-error.csv (only available for certain catalog DNA panels)
The Clonal Insights catalog DNA panel file can be found pre-uploaded in Tapestri Pipeline (Files → Panel Files). In addition to the Clonal Insights DNA panel, a regular DNA panel either catalog or custom can be used to run the CIS. For more information about these panel files, refer to this article.
Protein Panel
The protein panel is supplied as a single CSV file detailing the antibodies and their barcode sequences. The details of the panel can be seen here.
RNA Panel
The RNA panel is a .zip file consisting of four files -
- [required] *.amplicons
- [required] *.exons.bed
- [required] *.gtf
- [required] *.amplicon-gene.csv
The Clonal Insights catalog RNA panel file can be found pre-uploaded in Tapestri Pipeline (Files → Panel Files). For more information about these panel files, refer to this article.
Reference Genome
DNA Genome
The reference genome file should always be chosen to match the reference genome used to design the DNA panel.The catalog human reference genomes (hg19 and hg38) can be found pre-uploaded on Tapestri Pipeline (Files → Other Files).
RNA Genome
The catalog RNA reference genome file should always be chosen to match the reference genome used to design the RNA panel.The hg38 catalog human reference genomes can be found pre-uploaded on Tapestri Pipeline (Files → Other Files).
CSV Files
CIS Pipeline runs can include the following CSV files:
- Demultiplexing variants file (required for multiplexed DNA or DNA+Protein runs)
- Whitelist/Blacklist variants file (optional)
For more information about these input files, refer to this article.
Uploading CSV Files
Create the CSV file based on the details mentioned above, and then upload the file to Tapestri Pipeline. The CSV file must be uploaded before it can be used in a run. To upload a CSV file follow the instructions below:
- Click the Add Files button.
- Select the option Other from the left panel.
- Choose either Upload from Local Computer or Import from Amazon S3 based on where the CSV files are saved.
- In the drop downs, select the required type.
- Sample Variants File - To be used to upload the demultiplexing variant file. File extension is .csv.
- Somatic Variants File - To be used to upload the whitelisted/blacklisted variant file. File extension is .csv.
- Choose the files to add and click Upload.
- Once the upload completes, the files can be seen in the Other Files tab on the Files table.
Starting CIS Runs
The Tapestri Pipeline web application allows you to start four types of CIS pipeline runs:
- Clonal Insights DNA pipeline
- Clonal Insights DNA+Protein pipeline
- Clonal Insights Reprocess pipeline
- Clonal Insights Time course pipeline
CIS DNA Run
To process an CIS DNA run, follow the steps given below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline ‘Clonal Insights DNA.’
- Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
- Select the reference genome based on the panel.
- [Optional] Select the Whitelist/Blacklist Variants CSV file for the run.
- For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.
- [Optional] Set the ‘Report Whitelist variants only’ to ‘Yes’ if de novo variant calling is not required.
- Select the DNA panel.
- Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.
- Preview the run inputs and Submit the run.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the DNA pipeline report is seen on the Run Report tab.
- To view the CIS report, go to the Output Files tab and download the file tertiary/report/<prefix>.{sample_name}.html.
CIS DNA+Protein
To process a CIS DNA+Protein run, follow the steps given below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS DNA+Protein.
- Follow the same steps as for CIS DNA run, with the following updates:
- While selecting parameters, additionally select the protein panel .csv file.
- In the next step, select the Protein FASTQ files and assign them to the lanes correctly.
- To view the CIS reports, go to the Output Files tab and download the file tertiary/report/<prefix>.{sample_name}.html.
CIS DNA+RNA
To process a CIS DNA+RNA run, follow the steps given below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS DNA+RNA.
- Follow the same steps as for CIS DNA run, with the following updates:
- While selecting parameters, additionally select the RNA genome file.
- Select the RNA panel file
- In the next step, select the RNA FASTQ files and assign them to the lanes correctly.
- To view the CIS reports, go to the Output Files tab and download the file tertiary/report/<prefix>.{sample_name}.html.
CIS Reprocess
The CIS Reprocess pipeline is used to run only the CIS module, which runs the DNA variant calling and reporting. It should be used to resolve issues such as incorrect demultiplexing, whitelisting expected variants, or blacklisting unwanted or false positive variants after the first run.
To start a CIS Reprocess run, follow the steps below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS Reprocess.
- Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
- Select the appropriate DNA panel.
NOTE: A protein or RNA panel file is not needed for this pipeline.
- [Optional] Select the Whitelist/Blacklist Variants file for the run to define true and false positive variants to be included or excluded from analysis.
- For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.
- Select the h5 file to be reprocessed.
- Preview the run inputs and Submit the run.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, no report is seen on the Run Report tab as the secondary pipeline is not run in this process.
- To view the CIS reports, go to the Output Files tab and download the file reports/<prefix>..{sample_name}.html.
CIS Time course
This pipeline is used to combine patient samples across multiple time points. If you want to analyze a single patient over a period of time, then you can run the samples individually through the CIS single sample pipeline and then use the h5 files from these runs to create a time course analysis report. To define the run follow the steps below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS Time course.
- [Optional] Select the Whitelist/Blacklist Variants CSV file for the run.
- [Optional] Set the ‘Report Whitelist variants only’ to ‘Yes’ if de novo variant calling is not required.
- Select the h5 files from a previous CIS run listed in the table.
- Define the time point name for the samples. For example, the timepoints can be defined as Diagnosis, Remission, Relapse etc.
- Define the order of each sample by dragging and dropping the row at the appropriate position in the table using the “=” icon seen in the beginning of each row.
-
Select the DNA panel.
NOTE: A protein or RNA panel file is not needed for this pipeline.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the CIS report is seen on the Run Report tab.
CIS Output Files
CIS pipeline generates the following files:
CIS DNA/DNA+Protein/DNA+RNA single sample analysis
CIS pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated:
Secondary analysis pipeline
- <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only) / dna+rna.h5 (DNA + RNA only)
- <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- tapestri_run_output.txt
- <prefix>.qc.json
- <prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
- <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
- <prefix>.mapped.bam
- <prefix>.cells.bam
- <prefix>.cells.bam.csi
- <prefix>.report.html
- <prefix>.metrics.json
- <prefix>.cells.vcf.gz
- <prefix>.allele.drop.out.report.txt
- <prefix>.r1_read_barcode_distribution.tsv (DNA + RNA only)
- <prefix>_barcode_metrics.txt (DNA + RNA only)
- <prefix>_barcodes.txt (DNA + RNA only)
- <prefix>.rna.h5 (DNA + RNA only)
- <prefix>.barcodes_fixed.bam.bai (DNA + RNA only)
- <prefix>.barcodes_fixed.bam (DNA + RNA only)
- star_alignment.log (DNA + RNA only)
- progress.log (DNA + RNA only)
- barcode_extraction.log (DNA + RNA only)
CIS Specific Outputs
- tertiary/reports/<prefix>.{sample}.html
- tertiary/h5/{sample}.h5
CIS DNA/DNA+Protein/DNA+RNA multiplexed sample analysis
CIS pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated:
Secondary analysis pipeline
- <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only) / dna+rna.h5 (DNA + RNA only)
- <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- tapestri_run_output.txt
- <prefix>.qc.json
- <prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
- <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
- <prefix>.mapped.bam
- <prefix>.cells.bam
- <prefix>.cells.bam.csi
- <prefix>.report.html
- <prefix>.metrics.json
- <prefix>.cells.vcf.gz
- <prefix>.r1_read_barcode_distribution.tsv (DNA + RNA only)
- <prefix>_barcode_metrics.txt (DNA + RNA only)
- <prefix>_barcodes.txt (DNA + RNA only)
- <prefix>.rna.h5 (DNA + RNA only)
- <prefix>.barcodes_fixed.bam.bai (DNA + RNA only)
- <prefix>.barcodes_fixed.bam (DNA + RNA only)
- star_alignment.log (DNA + RNA only)
- progress.log (DNA + RNA only)
- barcode_extraction.log (DNA + RNA only)
CIS Specific Outputs
Multiple copies of the following files, one set for each multiplexed sample:
- tertiary/reports/<prefix>.{sample}.html
- tertiary/h5/<prefix>.{sample}.h5
- samples/<prefix>.{sample}.dmx.report.html
- samples/<prefix>.{sample}.dmx.metrics.json
- samples/<prefix>.{sample}.dmx.h5
- samples/<prefix>.{sample}.dmx.cells.bam
- samples/<prefix>.{sample}.dmx.cells.bam.csi
CIS Time Course analysis
CIS time course analysis consolidates 2-5 sample H5s and creates the following output files:
- <prefix>.html
- <prefix>.h5
For more information about Clonal Insights output files, refer to this article.
To download any output file, click the download icon to the left of the File Name.
Note: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again.
CIS Report Overview
To download the CIS Run Report, go to the Output Files tab and download the tertiary/reports/<prefix>.{sample_name}.html file. Plots and tables in the report are interactive.
Summary
The Summary page displays the following information:
- Total cells
- Mutant cells detected
- Mutant clones detected
- Clonal summary plot: Visual representation of Phylogeny, Clonal Fraction, Point Mutations, Protein Cluster Fraction, and RNA cluster fraction
- Clones table: Table with clone name, number of cells, mutations and protein differential expression.
- Somatic variants table: A table with the sample name (for time course), variant ID, gene, protein change, coding impact, cells mutated % and various other metrics.
- RNA Cluster table: A table with the RNA cluster details, its name, number of cells and the top marker gene for each cluster.
Advanced
The Advanced page displays the following information:
- Phylogenetic tree: A visualization showing the order in which the mutations were acquired and how they co-occur.
- DNA & Protein profile: A heatmap showing DNA clones subsorted by Protein.
- Protein cluster profile: A plot showing the normalized expression level for each cluster for all antibodies which have expression above 0.5 for at least one cluster.
- Protein UMAP: A UMAP plot showing the protein expression colored by either Protein, sample, clone or genotype.
- Protein expression correlation: A plot showing the correlation in expression for two proteins.
- Protein expression change over time: A time course analysis-only plot showing the change in protein expression between samples/time points
- RNA UMAPs: A UMAP plot showing the RNA expression colored by either RNA clusters or DNA clones.
- RNA UMAPs: Expression per gene: UMAP colored by each gene's expression
- RNA Heatmaps: A visual representation of the expression values for each marker gene in each cell . Heatmaps may be split by RNA clusters and DNA clones.
- RNA Marker Gene Dotplots and Tables: Dotplots show the average expression level (color of the dots) and % of cells expressing each marker gene within that cluster (size of the dot).
- RNA Gene Expression Table: The marker gene expression table listing expression statistics for all genes and all clusters.
- RNA Marker Gene Expression: The top marker gene expression for each DNA clone or RNA cluster. The left side of the violin plot (red) shows the expression distribution in the focal clone/cluster cells, while the right side (grey) shows all other cells in the sample.
RNA Gene Expression: Plot shows the distribution of expression for each gene (selected via dropdown menu), colored by either RNA cluster or DNA clone.
-
Sample meta data
- Run ID
- Sample ID
- DNA panel name
- DNA panel size
- Reference genome
- Secondary analysis pipeline version
- Tertiary analysis pipeline version
- Date analyzed
QC
- Heatmap of somatic variants (raw genotypes): A heatmap showing raw genotypes for the somatic variants per cell.
- Heatmap of protein expression: A heatmap showing the normalized protein expression per cell.
- RNA Heatmap of detected genes by RNA cluster: A visual representation of the expression values split by RNA clusters. Displays all detected genes (present in at least 5 cells)
- RNA Heatmap of detected genes by DNA clone: A visual representation of the expression values split by DNA clones. Displays all detected genes (present in at least 5 cells)
- Candidate variants Table: If there are amplicons which cover somatic variants, then this table is shown. It contains detailed information for all variants that pass a lenient filtering criteria. For each variant, the table shows the number of cells mutated, the percentage of cells mutated, reason for filtering the variant, whether the variant is passed as a whitelist or as a germline variant, and multiple Varsome annotations. The filtering happens sequentially and the annotations are only shown for variants which pass all the preceding filtering criteria. If a variant is a known somatic variant (whitelisted), it might still contain a value in the "Reason" column. It shows the reason the variant would have been filtered if it were not whitelisted. This table can be used to identify why a particular variant was not called in the report and is especially useful in cases when the variant was missed as it was defined as a germline variant for demultiplexing purposes.
Definitions
The Definitions page contains a glossary of key words used in the report and a description of every table and plot contained in the report.