Mission Bio's Clonal Insights Software(CIS) allows customers to process single-cell DNA or DNA + Protein sequencing data generated on the Tapestri Platform.
Setting up Tapestri Pipeline Account
Single Cell Clonal Insights Software
Time course or Multi-sample Analysis
Clonal Insights Software Inputs
CIS DNA/DNA+P single sample analysis
CIS DNA/DNA+P multiplexed sample analysis
Setting up Tapestri Pipeline Account
Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.
Single Cell Clonal Insights Software
Mission Bio’s Clonal Insights analysis pipeline is a complete end-to-end solution for the detection of rare leukemic cells which persist following treatment and can help predict disease relapse. FASTQ files generated from Tapestri Clonal Insights sequencing libraries are provided as input, and the pipeline generates reports of somatic mutations, clonal architecture, and protein expression profiles. The pipeline is compatible with either a single sample or multiple samples multiplexed together that are distinguishable via their germline genotype information (which must also be provided).
There are two types of Clonal Insights analysis:
- Single Sample analysis
- Time course or Multi-sample analysis
Single Sample Analysis
Clonal Insights Single Sample analysis requires FASTQ files from a single Tapestri run (either from one sample, or from up to three multiplexed samples) and it generates a report with the detected somatic variants/clones and the clonal architecture for each sample contained in the run. Each report represents a single time point from a single sample.
Time course or Multi-sample Analysis
Clonal Insights Time course analysis combines 2-5 single sample runs to generate a consolidated report summarizing the change in variant frequencies and clonal architecture over time.
Clonal Insights Software Inputs
Clonal Insights Software has the following inputs:
FASTQ files
Input FASTQ files are one or more pairs of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are required for the Clonal Insights DNA Pipeline, and DNA and Protein FASTQs are required for the Clonal Insights DNA+Protein Pipeline.
Panel files
DNA panel files are required by the Clonal Insights DNA Pipeline, and DNA and Protein panel files are required by the Clonal Insights DNA+Protein Pipeline.
DNA Panel
The DNA panel is a .zip file consisting of up to three files -
- [required] *.bed
- [required] *.amplicons
- [optional] *.per-variant-background-error.csv (only available for certain catalog DNA panels)
The Clonal Insights catalog DNA panel file can be found pre-uploaded in Tapestri Pipeline (Files → Panel Files). In addition to the Clonal Insights DNA panel, a regular DNA panel either catalog or custom can be used to run the CIS. For more information about these panel files, refer to this article.
Protein Panel
The protein panel is supplied as a single CSV file detailing the antibodies and their barcode sequences. The details of the panel can be seen here.
Reference Genome
The reference genome file should always be chosen to match the reference genome used to design the DNA panel.The catalog human reference genomes (hg19 and hg38) can be found pre-uploaded on Tapestri Pipeline (Files → Other Files).
CSV Files
CIS Pipeline runs can include the following CSV files:
- Demultiplexing variants file (required for multiplexed DNA or DNA+Protein runs)
- Whitelist/Blacklist variants file (optional)
For more information about these input files, refer to this article.
Uploading CSV Files
Create the CSV file based on the details mentioned above, and then upload the file to Tapestri Pipeline. The CSV file must be uploaded before it can be used in a run. To upload a CSV file follow the instructions below:
- Click the Add Files button.
- Select the option Other from the left panel.
- Choose either Upload from Local Computer or Import from Amazon S3 based on where the CSV files are saved.
- In the drop downs, select the required type.
- Sample Variants File - To be used to upload the demultiplexing variant file. File extension is .csv.
- Somatic Variants File - To be used to upload the whitelisted/blacklisted variant file. File extension is .csv.
- Choose the files to add and click Upload.
- Once the upload completes, the files can be seen in the Other Files tab on the Files table.
Starting CIS Runs
The Tapestri Pipeline web application allows you to start four types of CIS pipeline runs:
- Clonal Insights DNA pipeline
- Clonal Insights DNA+Protein pipeline
- Clonal Insights Reprocess pipeline
- Clonal Insights Time course pipeline
CIS DNA Run
To process an CIS DNA run, follow the steps given below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline ‘Clonal Insights DNA.’
- Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
- Select the reference genome based on the panel.
- [Optional] Select the Whitelist/Blacklist Variants CSV file for the run.
- For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.
- [Optional] Set the ‘Report Whitelist variants only’ to ‘Yes’ if de novo variant calling is not required.
- Select the DNA panel.
- Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.
- Preview the run inputs and Submit the run.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the DNA pipeline report is seen on the Run Report tab.
- To view the CIS report, go to the Output Files tab and download the file tertiary/report/<prefix>.{sample_name}.html.
CIS DNA+Protein
To process a CIS DNA+Protein run, follow the steps given below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS DNA+Protein.
- Follow the same steps as for CIS DNA run, with the following updates:
- While selecting parameters, additionally select the protein panel .csv file.
- In the next step, select the Protein FASTQ files and assign them to the lanes correctly.
- To view the CIS reports, go to the Output Files tab and download the file tertiary/report/<prefix>.{sample_name}.html.
CIS Reprocess
The CIS Reprocess pipeline is used to run only the CIS module, which runs the DNA variant calling and reporting. It should be used to resolve issues such as incorrect demultiplexing, whitelisting expected variants, or blacklisting unwanted or false positive variants after the first run.
To start a CIS Reprocess run, follow the steps below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS Reprocess.
- Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
- Select the appropriate DNA panel.
NOTE: A protein panel file is not needed for this pipeline.
- [Optional] Select the Whitelist/Blacklist Variants file for the run to define true and false positive variants to be included or excluded from analysis.
- For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.
- Select the h5 file to be reprocessed.
Note: For each run there are multiple h5s available; in order to select the correct file, first search the table using the output prefix. For example, if the run was named “CIS test” then search for “CIS_test” to limit the available h5 files. Once the h5 files are filtered, look for the one which is the output of the CIS DNA or CIS DNA+Protein pipeline. For CIS DNA runs, the file name contains ‘results/’ and for CIS DNA+Protein runs, the file name has no path or “/” in it. This is important as these h5 files are unprocessed and contain all the samples/data. Using other files may cause the run to fail.
- Preview the run inputs and Submit the run.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, no report is seen on the Run Report tab as the secondary pipeline is not run in this process.
- To view the CIS reports, go to the Output Files tab and download the file reports/<prefix>..{sample_name}.html.
CIS Time course
This pipeline is used to combine patient samples across multiple time points. If you want to analyze a single patient over a period of time, then you can run the samples individually through the CIS single sample pipeline and then use the h5 files from these runs to create a time course analysis report. To define the run follow the steps below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline CIS Time course.
- [Optional] Select the Whitelist/Blacklist Variants CSV file for the run.
- [Optional] Set the ‘Report Whitelist variants only’ to ‘Yes’ if de novo variant calling is not required.
- Select the h5 files from a previous CIS run listed in the table.
- Define the order or time point for the samples. There are 2 ways to define the order:
- Order the h5 files by the sequence in which the samples were collected. For example, the sample collected first can be assigned as 1, the next one as 2, the third one as 3, and so on.
- Specify the duration between the sample collection time points. For example, the first sample can be assigned as 1, a sample collected 20 days after that as 20, a sample collected 150 days later as 150, and so on.
NOTE: The interval between the order value determines the positioning of the x-axis on the Fishplot seen in the report.
- Select the DNA panel.
NOTE: A protein panel file is not needed for this pipeline.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the CIS report is seen on the Run Report tab.
CIS Output Files
CIS pipeline generates the following files:
CIS DNA/DNA+P single sample analysis
CIS pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated:
Secondary analysis pipeline
- <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
- <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- tapestri_run_output.txt
- <prefix>.qc.json
- <prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
- <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
- <prefix>.mapped.bam
- <prefix>.cells.bam
- <prefix>.cells.bam.csi
- <prefix>.report.html
- <prefix>.metrics.json
- <prefix>.cells.vcf.gz
- <prefix>.allele.drop.out.report.txt - Only Standard Run Mode
CIS Specific Outputs
- tertiary/reports/<prefix>.{sample}.html
- tertiary/h5/{sample}.h5
CIS DNA/DNA+P multiplexed sample analysis
CIS pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated:
Secondary analysis pipeline
- <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
- <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
- tapestri_run_output.txt
- <prefix>.qc.json
- <prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
- <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
- <prefix>.mapped.bam
- <prefix>.cells.bam
- <prefix>.cells.bam.csi
- <prefix>.report.html
- <prefix>.metrics.json
- <prefix>.cells.vcf.gz
- <prefix>.allele.drop.out.report.txt - Only Standard Run Mode
CIS Specific Outputs
Multiple copies of the following files, one set for each multiplexed sample:
- tertiary/reports/<prefix>.{sample}.html
- tertiary/h5/<prefix>.{sample}.h5
- samples/<prefix>.{sample}.dmx.report.html
- samples/<prefix>.{sample}.dmx.metrics.json
- samples/<prefix>.{sample}.dmx.h5
- samples/<prefix>.{sample}.dmx.cells.bam
- samples/<prefix>.{sample}.dmx.cells.bam.csi
CIS Time Course analysis
CIS time course analysis consolidates 2-5 sample H5s and creates the following output files:
- <prefix>.html
- <prefix>.h5
For more information about Clonal Insights output files, refer to this article.
To download any output file, click the download icon to the left of the File Name.
Note: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again.
CIS Report Overview
To download the CIS Run Report, go to the Output Files tab and download the tertiary/reports/<prefix>.{sample_name}.html file. Plots and tables in the report are interactive.
Summary
The Summary page displays the following information:
- Total cells
- Mutant cells detected
- Mutant clones detected
- Clonal summary plot: Visual representation of Clonal Fraction, Point Mutations, and Protein Markers.
- Clones table: Table with clone name, number of cells, mutations and protein differential expression.
- Somatic variants table: A table with the sample name (for time course), variant ID, gene, protein change, coding impact, cells mutated % and various other metrics.
Details
The Details page displays the following information:
- Phylogenetic tree: A visualization showing the order in which the mutations were acquired and how they co-occur.
- DNA profile: A heatmap showing DNA clones subsorted by Protein.
- Protein UMAP: A UMAP plot showing the protein expression colored by either Protein, sample, clone or genotype.
- Protein expression correlation: A plot showing the correlation in expression for two proteins.
- Protein expression change over time: A time course analysis-only plot showing the change in protein expression between samples/time points
-
Sample meta data
- Run ID
- Sample ID
- DNA panel name
- DNA panel size
- Reference genome
- Secondary analysis pipeline version
- Tertiary analysis pipeline version
- Date analyzed
QC
- Heatmap of somatic variants (raw genotypes): A heatmap showing raw genotypes for the somatic variants per cell.
- Heatmap of protein expression: A heatmap showing the normalized protein expression per cell.
- Candidate variants Table: If there are amplicons which cover somatic variants, then this table is shown. It contains detailed information for all variants that pass a lenient filtering criteria. For each variant, the table shows the number of cells mutated, the percentage of cells mutated, reason for filtering the variant, whether the variant is passed as a whitelist or as a germline variant, and multiple Varsome annotations. The filtering happens sequentially and the annotations are only shown for variants which pass all the preceding filtering criteria. If a variant is a known somatic variant (whitelisted), it might still contain a value in the "Reason" column. It shows the reason the variant would have been filtered if it were not whitelisted. This table can be used to identify why a particular variant was not called in the report and is especially useful in cases when the variant was missed as it was defined as a germline variant for demultiplexing purposes.
Definitions
The Definitions page contains a glossary of key words used in the report and a description of every table and plot contained in the report.