Mission Bio's MRD Pipeline allows customers to process single-cell MRD DNA + Protein sequencing data generated on the Tapestri Platform.
Table of Contents
Setting up Tapestri Pipeline Account
Single Cell Measurable Residual Disease Pipeline
Time course or Multi-sample Analysis
MRD single or multiplexed sample analysis
Setting up Tapestri Pipeline Account
Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.
Single Cell Measurable Residual Disease Pipeline
Mission Bio’s single cell MRD (scMRD) analysis pipeline is a complete end-to-end solution for the detection of rare leukemic cells which persist following treatment and can help predict disease relapse. Fastq files generated from Tapestri scMRD sequencing libraries are provided as input, and the pipeline generates reports of somatic mutations, clonal architecture, and protein expression profiles. The pipeline is compatible with either a single sample or multiple samples multiplexed together that are distinguishable via their germline genotype information (which must also be provided).
There are two types of MRD analysis:
- Single Sample analysis
- Time course or Multi-sample analysis
Single Sample Analysis
MRD Single Sample analysis requires FASTQ files from a single Tapestri run (either from one sample, or from up to three multiplexed samples) and generates an MRD report for each sample contained in the run. Each report represents a single time point from a single sample.
Time course or Multi-sample Analysis
MRD Time course analysis combines 2-5 MRD Single Sample runs to generate a consolidated report summarizing the change in variant frequencies and clonal architecture over time.
MRD Inputs
MRD pipeline has the following inputs:
FASTQ files
Input FASTQ files are one or more pairs of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA and Protein FASTQs are required for the MRD Pipeline.
Panel files
DNA and Protein panel files are required by the MRD pipeline.
DNA Panel
The DNA panel consists of the four files -
- *.bed
- *.amplicons
- systematic_variants.blacklist
- *_per-variant-background-error.csv
The first three are regular DNA panel files, details of which can be found in this article. The additional background-error-rate.csv file is generated by Mission Bio and used by the MRD pipeline to greatly increase its ability to find rare variants.
Protein Panel
The protein panel is supplied as a single csv file detailing the antibodies and their barcode sequences. The details of the panel can be seen here.
Reference Genome
Mission Bio-provided hg19 reference genomes should be used for processing MRD data. This catalog reference genome can be found pre-uploaded in Tapestri Pipeline (Files → Other Files).
VCF Files
VCF, or Variant Call Format, is a standardized text file format used for representing SNP, indel, and structural variation calls. It is an optional file and the MRD runs can be processed without it. It is used by the MRD pipeline to demultiplex samples, define known somatic variants or exclude variants which are being incorrectly called (false positives) by the pipeline. The user-submitted VCF file should conform to VCF v4.2 standard with a header section followed by the variant information per sample. For more details on the VCF file format refer to this article.
Upload VCF File
Create the VCF file based on the details mentioned above, and then upload the file to Tapestri Pipeline. The VCF file must be uploaded before it can be used in a run. To upload a VCF file follow the instructions below:
- Click the Add Files button.
- Select the option Other from the Left panel.
- In the dropdown select VCF with Demultiplexing Variants.
- Choose either Upload from Local Computer or Import from Amazon S3 based on where the VCF files are saved.
- Choose the files to add and click Upload.
- Once the upload completes, the files can be seen in the Other Files tab on the Files table.
Starting MRD Runs
The Tapestri Pipeline web application allows you to start two types of MRD pipeline runs:
- MRD-AML (DNA+Protein)
- MRD-AML (Time course)
MRD-AML (DNA+Protein)
To process an MRD-AML (DNA+Protein) run, follow the steps given below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline MRD-AML (DNA+Protein).
- Select the Human (hg19) genome.
- [Optional] Select the VCF file for the run.
- Select the AML-MRD DNA panel and the MRD protein panel.
- Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.
- Preview the run inputs and submit the run.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the DNA+Protein pipeline report is seen on the Run Report tab.
- To view the MRD reports, go to the Output Files tab and download the file mrd/REPORT/{patient_name}.html.
MRD-AML (Time course)
This pipeline is used to combine patient samples across multiple time points. If you want to analyze a single patient over a period of time, then you can run the samples individually through the MRD single sample pipeline and then use the h5 from these runs to create a time course analysis report. To define the run follow the steps below:
- Click the Start Run button.
- Add the run name.
- Select the Pipeline MRD (Time course).
- Select the MRD-AML DNA panel.
- Select the h5 files from a previous MRD run listed in the table.
- Define the order or time point for the samples. There are 2 ways to define the order:
- Order the h5 files by the sequence in which the samples were collected. For example, the sample collected first can be assigned as 1, the next one as 2, the third one as 3, and so on.
- Specify the duration between the sample collection time points. For example, the first sample can be assigned as 1, a sample collected 20 days after that as 20, a sample collected 150 days later as 150, and so on.
NOTE: The interval between the order value determines the positioning of the y-axes on Fishplot seen in the report.
- To view the results, click the name of the run in the Runs table.
- The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the MRD report is seen on the Run Report tab.
MRD-AML Output Files
MRD pipeline outputs the following files:
MRD single or multiplexed sample analysis
- <prefix>.report.html - HTML report for the DNA+Protein pipeline
- <prefix>--fastp.html - QC HTML report for DNA data
- <prefix>-fastp.html - QC HTML report for Protein data
- <prefix>.cells.bam - BAM file with reads for the called cells
- <prefix>.cells.bam.csi - BAM index file
- {patient_name}.html - MRD HTML report file with details on somatic variants, clonal architecture, protein differential expression and other details. For multiplexed runs, one file is produced for each sample.
- {patient_name}.h5 - This is the final per-sample h5 file and contains filtered variants, clone assignments, normalized CNV and protein data in addition to the contents of a regular h5 file as described in this article. This file can be used for time-course analysis.
Time course analysis reports
- <prefix>.html - Time course analysis HTML report file with details on somatic variants, clonal architecture, protein differential expression, and other details stratified by timepoint. Additionally displays changes in mutational profiles and protein expression across time points.
To download any output file, click the download icon to the left of the File Name.
Note: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again.
The first 5 files are produced by the DNA+Protein pipeline, for more information on these files refer to this article. The last two files are MRD specific, and for multiplexed runs there will be one of each of these files per multiplexed sample.
MRD-AML Report Overview
To download the MRD Run Report, go to the Output Files tab and download the mrd/{patient_name}.html file. Plots and tables in the report are interactive.
Summary
The Summary page displays the following information:
- Mutant clones detected
- Mutant cells detected
- Total cells
- Time points for time course reports
- Clone Barchart: A bar chart that shows the number of cells per clone.
- Somatic variant heatmap per clone
- Protein expression heatmap per clone
- Fishplot for time course reports
- Clones table: A table with sample name (for time course), clone name, the number of cells in clone, mutations and protein differential expression per clone.
- Variants table: A table with the sample name (for time course), variant ID, gene, protein change, coding impact, cells mutated % and various other metrics.
Details
The Details page displays the following information:
- Phylogenetic tree: A visualization showing the order in which the mutations were acquired and how they co-occur.
- Protein UMAP: A UMAP plot showing the protein expression colored by either Protein, sample, clone or genotype.
- Protein expression correlation: A plot showing the correlation in expression for two proteins.
- Protein expression change over time: A time course analysis only plot showing the change in protein expression over a period of time.
- Copy number analysis: Two plots providing a way to visualize copy number profiles for each clone grouped by either chromosome or gene.
-
Sample
- Run ID
- Sample ID
- DNA panel name
- DNA panel size
- Reference genome
- Secondary analysis pipeline version
- scMRD pipeline version
- Date analyzed
QC
- Germline variant / multiplexing diagnostic plot: A plot providing a visual representation of germline variant information to help confirm and diagnose sample identity issues.
- Heatmap of somatic variants(raw genotypes): A heatmap showing raw genotype for the somatic variants per cell.
- Heatmap of protein expression: A heatmap showing the normalized protein expression per cell.
Help
The Help page contains a glossary of key words used in the report and a description of every table and plot contained in the report.