Tapestri Multiple Myeloma(MM) Pipeline User Guide

  • Updated

Mission Bio's Multiple Myeloma(MM) Pipeline allows customers to process single-cell MM DNA and DNA + Protein sequencing data generated on the Tapestri Platform.

Table of Contents

Setting up Tapestri Pipeline Account

Single Cell Multiple Myeloma Sample Analysis Pipeline

MM DNA Pipeline

MM DNA+Protein Pipeline

MM Reprocess Pipeline

Time course Pipeline

MM Inputs

FASTQ files

Panel files

DNA Panel

Protein Panel

Reference Genome

CSV Files

Uploading CSV Files

Starting MM Runs

MM DNA

MM DNA+Protein

MM Reprocess

MM Time course

MM Output Files

MM DNA/DNA+P single sample analysis

Secondary analysis pipeline

MM Specific Outputs

VDJ Specific Outputs

MM DNA/DNA+P multiplexed sample analysis

Secondary analysis pipeline

MM Specific Outputs

VDJ Outputs

MM Time Course analysis

MM Report Overview

Summary

Advanced

QC

Definitions

 

Setting up Tapestri Pipeline Account

Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.

Single Cell Multiple Myeloma Sample Analysis Pipeline

Mission Bio’s single cell Multiple Myeloma (scMM) analysis pipeline is a complete end-to-end solution for the analysis of Multiple Myeloma clinical samples. The pipeline enables assessing the presence of somatic clones based on mutations, whole genome and gene specific CNVs relative to a spike-in cell line (e.g. GM12878), VDJ clonotypes, and (optionally), cell-surface protein expression. Mutational co-occurrence and zygosity can be measured to understand clonal evolution. The pipeline provides a summary report and intermediate files with pertinent single-cell data. The pipeline is compatible with either a single sample or multiple samples multiplexed together that are distinguishable via their germline genotype information (which must also be provided). This end-to-end solution enables users to quickly assess the heterogeneity of thousands of cells in MM samples.

There are four MM analysis pipelines:

  1. MM DNA Pipeline
  2. MM DNA+Protein Pipeline
  3. MM Reprocess Pipeline
  4. Time course Pipeline

MM DNA Pipeline

MM DNA pipeline requires FASTQ files from a Tapestri DNA run (either from one sample, or from up to three multiplexed samples) and generates an MM report for each of the samples contained in the run. 

MM DNA+Protein Pipeline

MM DNA+Protein pipeline requires FASTQ files from a Tapestri DNA+Protein run (either from one sample, or from up to three multiplexed samples) and generates an MM report for each of the samples contained in the run. 

MM Reprocess Pipeline

MM Reprocess pipeline requires an h5 file from an existing MM DNA or MM DNA+Protein run and generates an MM report for each of the samples contained in the run. The Reprocess pipeline is used to run only the MM module which includes the DNA variant analysis, CNV analysis, VDJ clonotyping, and reporting. It should be used to resolve issues such as incorrect demultiplexing, whitelisting expected variants, or blacklisting unwanted or false positive variants. 

Time course Pipeline

MM Time course analysis requires the h5 files from 2-5 existing MM DNA or MM DNA+Protein runs  to generate a consolidated report summarizing the changes in the samples over time. 

MM Inputs

MM pipeline has the following inputs:

FASTQ files

Input FASTQ files are one or more pairs of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are required for the MM DNA Pipeline, and DNA and Protein FASTQs are required for the MM DNA+Protein Pipeline.

Panel files

DNA panel files are required by the MM DNA Pipeline, and DNA and Protein panel files are required by the MM DNA+Protein Pipeline. 

DNA Panel

The DNA panel consists of five files - 

  • *.bed
  • *.amplicons 
  • systematic_variants.blacklist
  • *.per-variant-background-error.csv 
  • *.amplicon.info.csv 

The MM catalog DNA panel file can be found pre-uploaded in Tapestri Pipeline (Files → Panel Files). For more information about these panel files, refer to this article.  

Protein Panel

The protein panel is supplied as a single csv file detailing the antibodies and their barcode sequences. The details of this panel file can be found here. The MM catalog Protein panel file can be found pre-uploaded in Tapestri Pipeline(Files → Panel Files). 

Reference Genome

The Mission Bio-provided hg19 reference genome should be used for processing MM data. This catalog reference genome can be found pre-uploaded in Tapestri Pipeline (Files → Other Files). 

CSV Files

MM Pipeline runs can include the following CSV files:

  1. Demultiplexing variants file (required for multiplexed DNA or DNA+Protein runs)
  2. Spike-in variants file (required for CNV detection)
  3. Whitelist/Blacklist variants file (optional)
  4. Spike-in CNV profile file (optional)

For more information about these input files, refer to this article.

Uploading CSV Files

Create the CSV file based on the details mentioned above, and then upload the file to Tapestri Pipeline. The CSV file must be uploaded before it can be used in a run. To upload a CSV file follow the instructions below:

  1. Click the Add Files button.

  1. Select the option Other from the Left panel.
  2. Choose either Upload from Local Computer or Import from Amazon S3 based on where the CSV files are saved.
  3. In the dropdowns, select the required type.
    1. Sample Variants File - To be used to upload the demultiplexing variant file. File extension is .csv.
    2. Somatic Variants File - To be used to upload the whitelisted/blacklisted variant file. File extension is .csv.
    3. Spike-in Genotype File - To be used to upload the spike-in variant file. File extension is .genotype.csv. If not provided, CNVs will not be called.
    4. Spike-in CNV File - To be used to upload the spike-in CNV file. File extension is .cnv.csv. If not provided, CNVs will be called with diploid assumption for the spike-in.
  4. Choose the files to add and click Upload.

  1. Once the upload completes, the files can be seen in the Other Files tab on the Files table.

To upload the FASTQ files, follow the same steps but select the File Type of FASTQ. Additional details on File source and configuring an AWS or Basespace account can be found here.

Starting MM Runs

The Tapestri Pipeline web application allows you to start four types of MM pipeline runs:

  • MM DNA 
  • MM DNA+Protein
  • MM Reprocess
  • MM Time course

MM DNA

To process an MM DNA run, follow the steps given below:

  1. Click the Start Run button.

  1. Add the run name.

  1. Select the Pipeline MM DNA v1.
  2. Select the Human (hg19) genome.
  3. Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
  4. Select the MM DNA Panel - Multiple_Myeloma_hotspots_339amplicons_hg19.zip if using the MM panelMultiple_Myeloma_hotspots_wgCNV_839amplicons_hg19.zip if using a combined MM+wgCNV panel.
  5. [Optional] Select the Whitelist/Blacklist Variants file for the run to define true and false positive variants to be included or excluded from analysis.
  6. [Optional] Select the Spike-in Variants file if the CNV calling is needed.
  7. [Optional] Select the Spike-in CNV file if the spiked-in cell line is not fully diploid.
  8. For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.

  1. Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.

  1. Preview the run inputs and submit the run.
  2. To view the results, click the name of the run in the Runs table.
  3. The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the DNA pipeline report is seen on the Run Report tab.
  4. To view the MM reports, go to the Output Files tab and download the file tertiary/reports/{sample_name}.html. 

MM DNA+Protein 

To process an MM DNA+Protein run, follow the steps given below:

  1. Click the Start Run button.

  1. Add the run name.
  2. Select the Pipeline MM DNA+Protein v1
  3. Follow the same steps as for MM DNA run, with the following updates:
    1. While selecting parameters, select the catalog protein panel - TotalSeq™-D Multiple Myeloma Antibody Cocktail.csv
    2. In the next step select the Protein FASTQ files and assign them to the lanes correctly.
  4. To view the MM reports, go to the Output Files tab and download the file tertiary/reports/{sample_name}.html. 

MM Reprocess

The MM Reprocess pipeline is used to run only the MM module, which includes the DNA variant analysis, CNV analysis, VDJ clonotyping, and reporting. It should be used to resolve issues such as incorrect demultiplexing, whitelisting expected variants, or blacklisting unwanted or false positive variants after the first run. 

 To start a MM Reprocess run, follow the steps below:

  1. Click the Start Run button.

  1. Add the run name.
  2. Select the Pipeline MM Reprocess v1.
  3. Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
  4. Select the appropriate DNA panel.

NOTE: A protein panel file is not needed for this pipeline.

  1. [Optional] Select the Whitelist/Blacklist Variants file for the run to define true and false positive variants to be included or excluded from analysis.
  2. [Optional] Select the Spike-in Variants file if the CNV calling is needed.
  3. [Optional] Select the Spike-in CNV file if the spiked-in cell line is not fully diploid.
  4. For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.
  5. Select the h5 file to be reprocessed.

Note: For each run there are multiple h5s available; in order to select the correct file, first search the table using the output prefix. For example, if the run was named “MM test” then search for  “MM_test” to limit the available h5 files. Once the h5 files are filtered, look for the one which is the output of the MM DNA or MM DNA+Protein pipeline. For MM DNA runs, the file name contains ‘results/’ and for DNA+Protein, the file name has no path or “/” in it. This is important as these h5 files contain the full assays (including the samples and the spike-in) and are unprocessed. Using other files may cause the run to fail.

  1. Preview the run inputs and submit the run.
  2. To view the results, click the name of the run in the Runs table.
  3. The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, no report is seen on the Run Report tab as the secondary pipeline is not run in this process.
  4. To view the MM reports, go to the Output Files tab and download the file reports/{sample_name}.html. 

MM Time course

This pipeline is used to combine patient samples across multiple time points. If you want to analyze a single patient over a period of time, then you can run the samples individually through the MRD single sample pipeline and then use the h5 files from these runs to create a time course analysis report. To define the run, follow the steps below:

  1. Click the Start Run button.
  2. Add the run name.
  3. Select the Pipeline MM Time course.
  4. Select the appropriate DNA panel.
  5. Select the h5 files from previous MM runs listed in the table.

Note: For each run there are multiple h5s available; in order to select the correct file, first search the table using “samples/” in the name. This is important as these h5 files are processed and contain individual samples (and not multiplexed data). Using other files may cause the run to fail.

  1. Define the order or time point for the samples. There are 2 ways to define the order:
    1. Order the h5 files by the sequence in which the samples were collected. For example, the sample collected first can be assigned as 1, the next one as 2,  the third one as 3, and so on.
    2. Specify the duration between the sample collection time points. For example, the first sample can be assigned as 1, a sample collected 20 days after that as 20, a sample collected 150 days later as 150, and so on.

  1. Preview the run inputs and submit the run.
  2. To view the results, click the name of the run in the Runs table.
  3. The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the MM report is seen on the Run Report tab.

MM Output Files

MM pipeline outputs the following files:

MM DNA/DNA+P single sample analysis 

MM pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated:

  • Secondary analysis pipeline

    • <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only) 
    • <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • tapestri_run_output.txt 
    • <prefix>.qc.json
    • <prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
    • <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
    • <prefix>.mapped.bam 
    • <prefix>.cells.bam 
    • <prefix>.cells.bam.csi
    • <prefix>.report.html
    • <prefix>.metrics.json
    • <prefix>.cells.vcf.gz
    • <prefix>.allele.drop.out.report.txt - Only Standard Run Mode
  • MM Specific Outputs

    • samples/<prefix>.sample.report.html
    • samples/<prefix>.sample.metrics.json
    • samples/<prefix>.sample.h5
    • samples/<prefix>.sample.cells.bam
    • samples/<prefix>.sample.cells.bam.csi
    • samples/<prefix>.<spikein_name>.spikein.report.html
    • samples/<prefix>.<spikein_name>.spikein.metrics.json
    • samples/<prefix>.<spikein_name>.spikein.h5
    • samples/<prefix>.<spikein_name>.spikein.cells.bam
    • samples/<prefix>.<spikein_name>.spikein.cells.bam.csi
    • tertiary/reports/<prefix>.{sample}.html 
    • tertiary/h5/{sample}.h5 
  • VDJ Specific Outputs

    • vdj/<prefix>_<bcr_type>-<gene>_summary.tsv
    • vdj/<prefix>_<bcr_type>-<gene>_summary_filtered.tsv
    • vdj/<prefix>_report.tsv 
    • vdj/<prefix>_report_filtered.tsv 
    • vdj/<prefix>_metrics.json  
    • vdj/logs/progress.log 

MM DNA/DNA+P multiplexed sample analysis 

MM pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated:

  • Secondary analysis pipeline

    • <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only) 
    • <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • tapestri_run_output.txt 
    • <prefix>.qc.json<prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
    • <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
    • <prefix>.mapped.bam 
    • <prefix>.cells.bam 
    • <prefix>.cells.bam.csi
    • <prefix>.report.html
    • <prefix>.metrics.json
    • <prefix>.cells.vcf.gz<prefix>.allele.drop.out.report.txt - Only Standard Run Mode
  • MM Specific Outputs

    • samples/<prefix>.<spikein_name>.spikein.report.html
    • samples/<prefix>.<spikein_name>.spikein.metrics.json
    • samples/<prefix>.<spikein_name>.spikein.h5
    • samples/<prefix>.<spikein_name>.spikein.cells.bam
    • samples/<prefix>.<spikein_name>.spikein.cells.bam.csi
    • Multiple copies of following files, one set for each multiplexed sample and a single set of files for spike-in cell line:

      • tertiary/reports/<prefix>.{sample}.html 
      • tertiary/h5/{sample}.h5 
      • samples/<prefix>.{sample}.dmx.report.html
      • samples/<prefix>.{sample}.dmx.metrics.json
      • samples/<prefix>.{sample}.dmx.h5
      • samples/<prefix>.{sample}.dmx.cells.bam
      • samples/<prefix>.{sample}.dmx.cells.bam.csi
  • VDJ Outputs

    • vdj/<prefix>_<bcr_type>-<gene>_summary.tsv
    • vdj/<prefix>_<bcr_type>-<gene>_summary_filtered.tsv
    • vdj/<prefix>_report.tsv 
    • vdj/<prefix>_report_filtered.tsv 
    • vdj/<prefix>_metrics.json  
    • vdj/logs/progress.log

MM Time Course analysis 

MM time course analysis consolidates 2-5 sample H5s and creates the following output files:

  • <prefix>.html 
  • <prefix>.h5

For more information about Genome Integrity output files, refer to this article.

To download any output file, click the download icon to the left of the File Name.

Note: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again. 

MM Report Overview

To download the MM Run Report, go to the Output Files tab and download the tertiary/reports/{sample_name}.html file. Plots and tables in the report are interactive. 

Summary

The Summary page displays the following information:

  • Total cells
  • Mutant cells detected
  • Mutant clones detected
  • Clonal summary plot: Visual representation of VDJ Clonotypes, Prognostic Structural Variants, Structural Variant Count, Focal CNV, Point Mutations, Prognostic Protein Markers, Clonal Fraction
  • Clones table: Table with clone name, number of cells, mutations, protein differential expression, large CNVs and VDJ clonotypes.
  • CNV correlated variants table: A table with the sample name (for time course), variant ID, gene, protein change, coding impact, cells mutated % and various other metrics.

Advanced

The Details page displays the following information:

  • Phylogenetic tree: A visualization showing the order in which the mutations were acquired and how they co-occur.
  • DNA profile: A heatmap showing DNA Cluster Signature subsorted by Protein.
  • CNV profile: A plot showing the genome-wide CNV events.
  • VDJ clonotypes: A bar chart that shows unique VDJ recombination events observed in the clones
  • Protein UMAP: A UMAP plot showing the protein expression colored by either Protein, sample, clone or genotype.
  • Protein expression correlation: A plot showing the correlation in expression for two proteins.
  • Protein expression change over time: A time course analysis-only plot showing the change in protein expression over a period of time.
  • Sample
    • Run ID
    • Sample ID
    • DNA panel name
    • DNA panel size
    • Reference genome
    • Secondary analysis pipeline version
    • Tertiary analysis  pipeline version
    • Date analyzed

QC

  • Raw CNV counts with VAFs: It shows the raw CNV counts with average variant allele frequency (VAF) in each clone for germline variants that are located in the corresponding amplicons.
  • Germline variant / multiplexing diagnostic plot: Only for demultiplexed samples - A plot providing a visual representation of germline variant information to help confirm and diagnose sample identity issues.
  • Heatmap of somatic variants (raw genotypes): A heatmap showing raw genotypes for the somatic variants per cell.
  • Heatmap of protein expression: A heatmap showing the normalized protein expression per cell.

Definitions

The Definitions page contains a glossary of key words used in the report and a description of every table and plot contained in the report.

Share this article:

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request