Tapestri Genome Integrity(GI) Pipeline User Guide

  • Updated

Mission Bio's Genome Integrity (GI) Pipeline allows customers to process single-cell GI DNA and DNA + Protein sequencing data generated on the Tapestri Platform.

Table of Contents

Setting up Tapestri Pipeline Account

Single Cell Genome Integrity Sample Analysis Pipeline

GI DNA Pipeline

GI DNA+Protein Pipeline

GI Reprocess Pipeline

Genome Integrity Inputs

FASTQ files

Panel files

DNA Panel

Uploading GI Panel Files

Protein Panel

Reference Genome

CSV Files

Uploading CSV Files

Starting GI Runs

GI DNA

GI DNA+Protein

GI Reprocess

GI Output Files

GI DNA/DNA+P single sample analysis

Secondary analysis pipeline

GI Specific Outputs

GI DNA/DNA+P multiplexed sample analysis

Secondary analysis pipeline

GI Specific Outputs

GI Report Overview

Summary

Advanced

QC

Definitions

 





Setting up Tapestri Pipeline Account

Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.

Single Cell Genome Integrity Sample Analysis Pipeline

Mission Bio’s single cell Genome Integrity (GI) analysis pipeline is a complete end-to-end solution for the analysis of genome-wide(gwCNV) as well as focal CNV(fCNV) events in a sample. The pipeline enables assessing the presence of gwCNV clones relative to a spike-in cell population (e.g. GM12878) and reports the correlation of these clones with SNVs, focal CNV events as well as cell surface protein expression. The pipeline provides a summary report and intermediate files with pertinent single-cell data. The pipeline is compatible with either a single sample or multiple samples multiplexed together that are distinguishable via their germline genotype information (which must also be provided). This end-to-end solution enables users to quickly assess the CNV profile of thousands of cells in a given sample.

 

There are three GI analysis pipelines:

  1. GI DNA Pipeline
  2. GI DNA+Protein Pipeline
  3. GI Reprocess Pipeline

 

NOTE: The GI pipeline is also compatible with our Multiple Myeloma and Gene Editing solutions.

GI DNA Pipeline

GI DNA pipeline requires FASTQ files from a Tapestri DNA run (either from one sample, or from up to three multiplexed samples) and generates a GI report for each of the samples contained in the run. 

GI DNA+Protein Pipeline

GI DNA+Protein pipeline requires FASTQ files from a Tapestri DNA+Protein run (either from one sample, or from up to three multiplexed samples) and generates a GI report for each of the samples contained in the run. 

GI Reprocess Pipeline

GI Reprocess pipeline requires an h5 file from an existing GI DNA or GI DNA+Protein run and generates a GI report for each of the samples contained in the run. The Reprocess pipeline is used to run only the GI module which includes the demultiplexing, CNV analysis, mutation analysis and reporting. It should be used to resolve issues such as incorrect demultiplexing, whitelisting expected variants, or blacklisting unwanted or false positive variants. 

Genome Integrity Inputs

GI pipeline has the following inputs:

FASTQ files

Input FASTQ files are one or more pairs of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are required for the GI DNA Pipeline, and DNA and Protein FASTQs are required for the GI DNA+Protein Pipeline.

Panel files

DNA panel files are required by the GI DNA Pipeline, and DNA and Protein panel files are required by the GI DNA+Protein Pipeline. 

DNA Panel

The DNA panel consists of four files - 

  • *.bed
  • *.amplicons 
  • systematic_variants.blacklist
  • *.amplicon.info.csv 

These four files need to be zipped together and uploaded to Tapestri Pipeline as a ‘Genome Integrity(GI) Panel’ file type. For more information about these panel files, refer to this article.  

Uploading GI Panel Files

Create the panel files based on the details mentioned above, and then .zip the files together prior to uploading to Tapestri Pipeline. This file needs to be uploaded to Tapestri Pipeline before it can be used in a GI run. To upload the GI Panel Files, follow the instructions below:

  1. Click the Add Files button.
  2. Select the option Panel from the left panel.
  3. In the dropdown select GI DNA Panel.
  4. Choose the file to add and click Upload.
  5. Once the upload completes, the files can be seen in the Panels tab on the Files table.

Protein Panel

This file is only necessary for DNA+Protein runs. This file is a 3-column .csv file, for more information about the format of this file, please refer to this article. To upload a protein panel file, follow the instructions below:

  1. Click the Add Files button.
  2. Select the option Panel from the left panel.
  3. In the dropdown select Protein Panel.
  4. Choose the files to add from your Local Computer and click Upload.
  5. Once the upload completes, the files can be seen in the Panels tab on the Files table.

 

Reference Genome

We recommend that you use one of the Mission Bio-provided reference genomes. The reference genome used for the pipeline must match the reference genome the panel was created with. If a custom reference genome was used, please upload the .fa.zip file of the genome to your Tapestri Pipeline account, following the instructions provided here

 

CSV Files

GI Pipeline runs can include the following CSV files:

  1. Spike-in variants file (required for CNV detection)
  2. Spike-in CNV profile file (optional)
  3. Demultiplexing variants file (required for multiplexed DNA or DNA+Protein runs)
  4. Whitelist/Blacklist variants file (optional)

 

For more information about these input files, refer to this article.

Uploading CSV Files

Create the CSV file based on the details mentioned above, and then upload the file to Tapestri Pipeline. The CSV file must be uploaded before it can be used in a run. To upload a CSV file follow the instructions below:

  1. Click the Add Files button.

  1. Select the option Other from the Left panel.
  2. Choose either Upload from Local Computer or Import from Amazon S3 based on where the CSV files are saved.
  3. In the dropdowns, select the required type.
    1. Spike-in Genotype File - To be used to upload the spike-in variant file. File extension is .genotype.csv. If not provided, CNVs will not be called.
    2. Spike-in CNV File - To be used to upload the spike-in CNV file. File extension is .cnv.csv. If not provided, CNVs will be called with diploid assumption for the spike-in.
    3. Sample Variants File - To be used to upload the demultiplexing variant file. File extension is .csv. If not provided, the run will be treated as one sample.
    4. Somatic Variants File - To be used to upload the whitelisted/blacklisted variant file. File extension is .csv.
  4. Choose the files to add and click Upload.

  1. Once the upload completes, the files can be seen in the Other Files tab on the Files table.

 

To upload the FASTQ files, follow the same steps but select the File Type of FASTQ. Additional details on File source and configuring an AWS or Basespace account can be found here.

 

Starting GI Runs

The Tapestri Pipeline web application allows you to start four types of GI pipeline runs:

  • GI DNA 
  • GI DNA+Protein
  • GI Reprocess

GI DNA

To process a GI DNA run, follow the steps given below:

 

  1. Click the Start Run button.

  1. Add the run name.

  1. Select the Pipeline GI DNA v1.
  2. Select the Human (hg19) genome for the catalog Genome-wide CNV panel or if using a custom panel select the genome corresponding to that panel.
  3. Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
  4. Select the GI DNA Panel - Genome-wide CNV_hg19.zip if using the GI catalog panel or select any custom GI panel.
  5. [Optional] Select the Whitelist/Blacklist Variants file for the run to define true and false positive variants to be included or excluded from analysis.
  6. Select the Spike-in Variants file for CNV analysis.
  7. [Optional] Select the Spike-in CNV file if the spiked-in cell line is not fully diploid.
  8. For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.

  1. Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.

  1. Preview the run inputs and submit the run.
  2. To view the results, click the name of the run in the Runs table.
  3. The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, the DNA pipeline report is seen on the Run Report tab.
  4. To view the GI reports, go to the Output Files tab and download the file tertiary/reports/{sample_name}.html. 

GI DNA+Protein

To process an GI DNA+Protein run, follow the steps given below:

 

  1. Click the Start Run button.

  1. Add the run name.
  2. Select the Pipeline GI DNA+Protein v1
  3. Follow the same steps as for GI DNA run, with the following updates:
    1. While selecting parameters, select a protein panel.
    2. In the next step select the Protein FASTQ files and assign them to the lanes correctly.

  1. To view the GI reports, go to the Output Files tab and download the file tertiary/reports/{sample_name}.html. 

 

GI Reprocess

The GI Reprocess pipeline is used to run only the GI module on a single h5 file at a time, which includes the demultiplexing, CNV clone detection and reporting. Additionally, DNA variant analysis and focal CNV analysis are run if the appropriate inputs are provided. The h5 file from the DNA or DNA+Protein step of the run should be used for this run. It should be used to resolve issues such as incorrect demultiplexing, whitelisting expected variants, or blacklisting unwanted or false positive variants after the first run. 

 To start a GI Reprocess run, follow the steps below:

  1. Click the Start Run button.

  1. Add the run name.
  2. Select the Pipeline GI Reprocess v1.
  3. Select the Run Mode - Standard for single sample, Genotype Demultiplexing for multiplexed sample.
  4. Select the appropriate DNA panel.

NOTE: A protein panel file is not needed for this pipeline.

  1. [Optional] Select the Whitelist/Blacklist Variants file for the run to define true and false positive variants to be included or excluded from analysis.
  2. Select the Spike-in Variants file.
  3. [Optional] Select the Spike-in CNV file if the spiked-in cell line is not fully diploid.
  4. For Genotype Demultiplexing runs, additionally select the Demultiplexing Variants file.
  5. Select the h5 file to be reprocessed. Only a single h5 file can be selected.

NOTE: For each run there are multiple h5s available; in order to select the correct file, first search the table using the output prefix. For example, if the run was named “GI test” then search for  “GI_test” to limit the available h5 files. Once the h5 files are filtered, look for the one which is the output of the GI DNA or GI DNA+Protein pipeline. For GI DNA runs, the file name contains ‘results/’ and for DNA+Protein, the file name has no path or “/” in it. This is important as these h5 files contain the full assays (including the samples and the spike-in) and are unprocessed. Using other files may cause the run to fail.

  1. Preview the run inputs and submit the run.
  2. To view the results, click the name of the run in the Runs table.
  3. The Run details page shows the run summary with Run Report, Output Files and Input Files. By default, no report is seen on the Run Report tab as the secondary pipeline is not run in this process.
  4. To view the GI reports, go to the Output Files tab and download the file reports/{sample_name}.html. 

GI Output Files

The Genome Integrity Pipeline executes secondary and tertiary analysis pipelines together; based on the modules run, different sets of files are generated

GI DNA/DNA+P single sample analysis 

  • Secondary analysis pipeline

    • <prefix>.report.html 
    • <prefix>.metrics.json
    • <prefix>.mapped.bam 
    • <prefix>.cells.bam 
    • <prefix>.cells.bam.csi
    • <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only) 
    • <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • tapestri_run_output.txt 
    • <prefix>.qc.json
    • <prefix>-dna-fastp.html and <prefix>-protein-fastp.html
    • <prefix>-dna-fastp.json and <prefix>-protein-fastp.json
    • <prefix>.cells.vcf.gz
    • <prefix>.allele.drop.out.report.txt
  • GI Specific Outputs

    • tertiary/reports/<prefix>.sample.html 
    • tertiary/h5/<prefix>.sample.h5 
    • samples/<prefix>.sample.dmx.report.html
    • samples/<prefix>.sample.dmx.metrics.json
    • samples/<prefix>.sample.dmx.h5
    • samples/<prefix>.sample.dmx.cells.bam
    • samples/<prefix>.sample.dmx.cells.bam.csi
    • samples/<prefix>.<spikein_name>.spikein.report.html
    • samples/<prefix>.<spikein_name>.spikein.metrics.json
    • samples/<prefix>.<spikein_name>.spikein.h5
    • samples/<prefix>.<spikein_name>.spikein.cells.bam
    • samples/<prefix>.<spikein_name>.spikein.cells.bam.csi

 

GI DNA/DNA+P multiplexed sample analysis  

  • Secondary analysis pipeline

    • <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only) 
    • <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    • tapestri_run_output.txt 
    • <prefix>.qc.json
    • <prefix>-<analyte>-fastp.json and <prefix>-<analyte>-fastp.json
    • <prefix>-<analyte>-fastp.html and <prefix>-<analyte>-fastp.html
    • <prefix>.mapped.bam 
    • <prefix>.cells.bam 
    • <prefix>.cells.bam.csi
    • <prefix>.report.html
    • <prefix>.metrics.json
    • <prefix>.cells.vcf.gz
    • <prefix>.allele.drop.out.report.txt - Only Standard Run Mode
  • GI Specific Outputs

    • samples/<prefix>.<spikein_name>.spikein.report.html
    • samples/<prefix>.<spikein_name>.spikein.metrics.json
    • samples/<prefix>.<spikein_name>.spikein.h5
    • samples/<prefix>.<spikein_name>.spikein.cells.bam
    • samples/<prefix>.<spikein_name>.spikein.cells.bam.csi
    • Multiple copies of following files, one for each multiplexed sample:
      • tertiary/reports/<prefix>.{sample}.html 
      • tertiary/h5/{sample}.h5 
      • samples/<prefix>.{sample}.dmx.report.html
      • samples/<prefix>.{sample}.dmx.metrics.json
      • samples/<prefix>.{sample}.dmx.h5
      • samples/<prefix>.{sample}.dmx.cells.bam
      • samples/<prefix>.{sample}.dmx.cells.bam.csi
    •  

For more information about Genome Integrity output files, refer to this article.

To download any output file, click the download icon to the left of the File Name.

NOTE: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again. 

GI Report Overview

To download the GI Run Report, go to the Output Files tab and download the tertiary/reports/{sample_name}.html file. Plots and tables in the report are interactive. 

Summary

The Summary page displays the following information:

  • Total cells
  • Mutant cells detected
  • Mutant clones detected
  • Clonal summary plot: Visual representation of Structural Variants and Clonal Fraction It can additionally show focal CNV, Point Mutations and Prognostic Protein Markers based on input. 
  • Clones table: Table with clone name, number of cells and large CNVs . Mutations, protein differential expression, and focal CNVs may be present based on input.
  • CNV correlated variants table: If somatic variants are provided, a table with variant ID, gene, protein change, coding impact, cells mutated % and various other metrics.

Advanced

The Details page displays the following information:

  • CNV Profile Plot: 
  • Phylogenetic tree: A visualization showing the order in which the mutations were acquired and how they co-occur.
  • DNA profile: A heatmap showing DNA Cluster Signature subsorted by Protein.
  • CNV profile: A plot showing the ploidy for each amplicon marked as "gwCNV" in the input panel file in each clone.
  • Protein UMAP: A UMAP plot showing the protein expression colored by either Protein, sample, clone or genotype.
  • Protein expression correlation: A plot showing the correlation in expression for two proteins.
  • Protein expression change over time: A time course analysis-only plot showing the change in protein expression over a period of time.
  • Sample
    • Run ID
    • Sample ID
    • DNA panel name
    • DNA panel size
    • Reference genome
    • Secondary analysis pipeline version
    • Tertiary analysis  pipeline version
    • Date analyzed

QC

  • Raw CNV counts with VAFs: It shows the raw CNV counts with average variant allele frequency (VAF) in each clone for germline variants that are located in the corresponding amplicons.
  • Germline variant / multiplexing diagnostic plot: Only for demultiplexed samples - A plot providing a visual representation of germline variant information to help confirm and diagnose sample identity issues.
  • Heatmap of somatic variants (raw genotypes): A heatmap showing raw genotypes for the somatic variants per cell.
  • Heatmap of protein expression: A heatmap showing the normalized protein expression per cell.

Definitions

The Definition page contains a glossary of key words used in the report and a description of every table and plot contained in the report.

Share this article:

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request