Tapestri Pipeline Genome Editing User Guide

Updated November 29, 2024 07:20

Mission Bio’s Tapestri Pipeline Genome Editing Software allows customers to process single-cell gene editing DNA and DNA+Protein sequencing data generated on the Tapestri Platform.

Setting up Tapestri Pipeline Account
Single Cell Genome Editing Pipeline
Genome Editing Inputs
Report Types
1. Base Editing
2. Knockout
Starting a Genome Editing Run
Genome Editing Output Files
Genome Editing Report Overview

Setting up Tapestri Pipeline Account

Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.

Single Cell Genome Editing Pipeline

Mission Bio’s single cell Genome Editing (GE) analysis pipeline is a complete software solution for the detection of single cell edits and can help determine the rate of on- and off-target events and the rate of co-occurrence events. FASTQ files generated from Tapestri sequencing libraries are provided as input, and the pipeline generates reports of top on- and off-target edits, top edit combinations and the zygosity of editing events. The pipeline is compatible with both DNA only, and DNA+Protein samples.

There are three types of Genome Editing analysis:

Single sample analysis
Multi-sample analysis
Multiplexed sample analysis

Single-sample analysis

Single sample Genome Editing analysis works on the FASTQ files and provides a genome editing report for the single sample. This pipeline can be run with DNA only FASTQS, or DNA and Protein FASTQS. The resulting report will summarize on- and off- target editing activity for the sample.

Multi-sample analysis

Multi-sample Genome Editing analysis works on the h5 files produced from single sample analysis pipelines. The resulting report will summarize on-target editing activity across up to five samples.

Multiplexed Sample analysis

With the release of sample multiplexing, customers can now multiplex up to 3 samples on one Genome Editing Tapestri run using antibody hashing with seamless demultiplexing using the Genome Editing Pipeline. Antibody hashing is currently compatible with the Tapestri GE DNA assay only. Genome Editing Pipeline v1.1.1 supports antibody hashing using the GE DNA+Protein pipeline.

The details about these GE pipelines are presented in the following sections.

Genome Editing Inputs

FASTQ Files

Input FASTQ files are one pair (or multiple pairs if sequenced on multiple sequencing lanes) of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are always required to run this Pipeline. Protein FASTQs are only required for the GE DNA+Protein Pipeline.

Panel Files

To provide panel and target information, the following file name extensions are needed to run the pipeline:

*.amplicons
*.bed
*.designSummary.tab
*.submitted
*.target_groups.csv
*.amplicon.info.csv(only for CNV detection)

These five files need to be zipped together and uploaded to Tapestri Pipeline as a ‘Genome Editing Panel’ file type. For more information about these input files, refer to this article.

Uploading GE Panel Files

Create the panel files based on the details mentioned above, and then .zip the files together prior to uploading to Tapestri Pipeline. This file needs to be uploaded to Tapestri Pipeline before it can be used in a GE run. To upload the GE Panel Files, follow the instructions below:

Click the Add Files button.
Select the option Panel from the left panel.
In the dropdown select Genome Editing Panel.
Choose either Upload from Local Computer or Import from Amazon S3 based on where the Panel files are saved.
Choose the files to add and click Upload.
Once the upload completes, the files can be seen in the Panels tab on the Files table.

Note: this .zip cannot be created using the Finder function on Mac. This function adds an additional folder to the zip that will cause the panel file to fail uploading.

Uploading Protein Panel Files

This file is only necessary for DNA+Protein runs. This file is a 3-column .csv file, for more information about the format of this file, please refer to this article. To upload a protein panel file, follow the instructions below:

Click the Add Files button.
Select the option Panel from the left panel.
In the dropdown select Protein Panel.
Choose the files to add from your Local Computer and click Upload.
Once the upload completes, the files can be seen in the Panels tab on the Files table.

CSV Files

GE Pipeline runs can include the following CSV files:

Spike-in variants file (required for CNV detection)
Spike-in CNV profile file (optional)

For more information about these input files, refer to this article.

Uploading CSV Files

Create the CSV file based on the details mentioned above, and then upload the file to Tapestri Pipeline. The CSV file must be uploaded before it can be used in a run. To upload a CSV file follow the instructions below:

Click the Add Files button.

Select the option Other from the Left panel.
Choose either Upload from Local Computer or Import from Amazon S3 based on where the CSV files are saved.
In the dropdowns, select the required type.
1. Spike-in Genotype File - To be used to upload the spike-in variant file. File extension is .genotype.csv. If not provided, CNVs will not be called.
2. Spike-in CNV File - To be used to upload the spike-in CNV file. File extension is .cnv.csv. If not provided, CNVs will be called with diploid assumption for the spike-in.
Choose the files to add and click Upload.

Once the upload completes, the files can be seen in the Other Files tab on the Files table.

Reference Genome

We recommend that you use one of the Mission Bio-provided reference genomes. The reference genome used for the pipeline must match the reference genome the panel was created with. If a custom reference genome was used, please upload the .fa.zip file of the genome to your Tapestri Pipeline account, following the instructions provided here.

Report Types

Base Editing

The Base Editing (BE) Report should be selected for experiments in which a base editor was used to make single base substitutions in cells. The report shows SNVs within the on-target activity window for one or multiple on-targets and shows SNVs and indels for predicted off-target edits. The report does not display indel information in the on-target activity window(s), but this information is accessible in the H5 file.

Knockout

The Knockout (KO) Report should be selected for experiments in which a genome editor (e.g., CRISPR/Cas9) was used to make edits to cells and the intended edit type is indels (i.e, NHEJ repair pathway).The report shows indels observed at one or multiple on-target locations as well as for predicted off-target locations. The report does not display SNV information in the on-target activity window(s), but this information is accessible in the H5 file.

In cases where both Report Types are desired, please contact support (support@missionbio.com)

Starting a Genome Editing Run

Tapestri Pipeline application allows you to start three types of Genome Editing pipelines:

GE DNA only
GE DNA+Protein
GE Multi-sample

GE DNA only and GE DNA+Protein

To start a GE DNA only or GE DNA+Protein pipeline run, follow the steps below:

Click the Start Run button.

Add the run name, optionally, add a description about the run.

Select the Pipeline: GE DNA only or GE DNA+Protein based on the available input files.
Select the reference genome the panel was created with.
Select the Report Type: KO or BE.
Note: KO should be selected if analyzing CNV data.
If you are doing genome-wide CNV analysis, select the required file in the drop-downs.
1. Spike-in Genotype File - To be used to upload the spike-in variant file. File extension is .genotype.csv. If not provided, CNVs will not be called.
2. Spike-in CNV File - To be used to upload the spike-in CNV file. File extension is .cnv.csv. If not provided, CNVs will be called with diploid assumption for the spike-in.
Select the panel files.
1. If you are doing DNA+Protein, you will need to select your protein panel files, as well as your GE panel files.

Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.
1. If you are doing a DNA+Protein run, you will need to select both DNA and Protein FASTQ files.

Preview the run inputs and submit the run.

GE Multi-sample

This pipeline is used to combine output from multiple samples. If you want to compare editing activity across up to five runs (e.g., negative control and replicates). First each replicate/run needs to be processed on the GE pipeline individually, and then the GE Multi-sample pipeline can be used to compare multiple runs. To start a GE Multi-sample pipeline run, follow the steps below:

Click the Start Run button.
Add the run name, and optionally add a description about the run.
Select the Pipeline: GE Multi-sample.
Select the Report Type: KO or BE.
Select the panel files.
Select the h5 files from previous GE runs.

Preview the run inputs and submit the run.

NOTE: GE multi-sample runs do not support gwCNV analysis.

GE Sample Demultiplexing

Demultiplexing workflow helps to classify the cells into individual samples based on antibody expression. To run this pipeline follow the same steps as GE DNA+Protein single sample run but with an updated Protein panel. The protein panel should provide the mapping between the sample and the hashing antibody that it expresses. This can be done by adding a new column called “Sample_ID” to the protein panel and defining the hashing antibodies with the sample names in it. For example:

Sequence,Name,ID,Sample_ID

GTCAACTCTTTAGCG,Hashtag-1,Hashtag-1,Sample_1

TGATGGCCTATTGGG,Hashtag-2,Hashtag-2,Sample_2

TTCCGCCTCTCTTTG,Hashtag-3,Hashtag-3,Sample_3

The main difference between this demultiplexing run and a single sample run is that the demultiplexing run generates multiple genome editing reports whereas the single sample creates a single report. As genome editing report for the multiplexed data is not relevant the Report tab on the run details page is empty and the individual sample reports can be downloaded from the Output Files tab.

Genome Editing Output Files

The Gene Editing Pipeline outputs the following files:

The DNA only KO and BE reports, and the DNA+Protein KO and BE reports produce the following outputs:
- Run level Files
  - SAMPLE_NAME_raw_barcodes.txt
  - SAMPLE_NAME_cells.txt
  - SAMPLE_NAME.ge.h5 (GE-KO, GE-BE) / SAMPLE_NAME.ge_protein.h5 (GE+Protein-KO, GE+Protein-BE)
  - SAMPLE_NAME_cell_barcode_distribution.tsv
  - SAMPLE_NAME_all_barcode_distribution.tsv
  - SAMPLE_NAME_paired_reads.tsv
  - SAMPLE_NAME_primers.bam
  - SAMPLE_NAME_primers.bai
  - SAMPLE_NAME.aligned.bam
  - SAMPLE_NAME.pipeline_metrics.json
- Reporting Files
  - SAMPLE_NAME.report.html
  - CSV Files
    - top_on_target_alleles.csv
    - top_off_target_alleles.csv
    - top_edit_sites_per_guide.csv
    - top_edit_combinations_per_guide.csv (GE-KO, GE+Protein-KO only)
    - top10_ontarget_variant_zygosity.csv (GE-KO, GE+Protein-KO only)
    - summary_of_editing.csv
    - panel_uniformity.csv
    - ontarget_edit_cooccurrence.csv (GE-KO, GE+Protein-KO only)
    - ontarget_editing_zygosity.csv
    - ontarget_allele_status.csv
    - indel_start_locations.csv (GE-KO, GE+Protein-KO only)
    - indel_length_distribution.csv
    - ontarget_mutation_distribution.csv (GE-BE, GE+Protein-BE only)
    - ontarget_edit_cooccurrence_with_protein.csv
    - translocations.csv (GE-KO, GE+Protein-KO only)
  - HTML Files
    - top_edit_sites_per_guide.html
    - top10_ontarget_variant_zygosity.html (GE-KO, GE+Protein-KO only)
    - ontarget_edit_cooccurrence.html (GE-KO, GE+Protein-KO only)
    - ontarget_editing_zygosity.html
    - ontarget_allele_status.html
    - indel_start_locations.html (GE-KO, GE+Protein-KO only)
    - indel_length_distribution.html
    - ontarget_mutation_distribution.html (GE-BE, GE+Protein-BE only)
    - translocations.html (GE-KO, GE+Protein-KO only)
  - Panel Files
    - primers_window.tab
    - primers.tab
    - amplicon_groups.tsv
    - panel_file.modified_target_groups.csv
    - panel_file.modified_submitted
The Multi-sample KO and BE reports produce the following outputs:
- multi_sample.report.html
- top_edit_sites.csv
- top_edit_combinations.csv (KO only)
- ontarget_edit_zygosity.csv
- ontarget_edit_cooccurrence.csv
- ontarget_mutation_distribution.csv (BE only)
- translocations.csv (KO only)
Demultiplexing KO and BE runs produce the following outputs:
- Run level Files
  - RUN_NAME_raw_barcodes.txt
  - RUN_NAME_cells.txt
  - RUN_NAME.ge.h5 / RUN_NAME.ge_protein.h5
  - RUN_NAME_cell_barcode_distribution.tsv
  - RUN_NAME_all_barcode_distribution.tsv
  - RUN_NAME_primers.bam
  - RUN_NAME_primers.bai
  - RUN_NAME.aligned.bam
  - RUN_NAME.pipeline_metrics.json
  - RUN_NAME.report.html
- Sample level files (one set per sample)
  - SAMPLE_NAME_primers.bam
  - SAMPLE_NAME_primers.bai
  - SAMPLE_NAME.ge.h5 / SAMPLE_NAME.ge_protein.h5
  - Reporting Files
    - SAMPLE_NAME.report.html
    - CSV Files
      - top_on_target_alleles.csv
      - top_off_target_alleles.csv
      - top_edit_sites_per_guide.csv
      - top_edit_combinations_per_guide.csv (GE-KO, GE+Protein-KO only)
      - top10_ontarget_variant_zygosity.csv (GE-KO, GE+Protein-KO only)
      - summary_of_editing.csv
      - panel_uniformity.csv
      - ontarget_edit_cooccurrence.csv (GE-KO, GE+Protein-KO only)
      - ontarget_editing_zygosity.csv
      - ontarget_allele_status.csv
      - indel_start_locations.csv (GE-KO, GE+Protein-KO only)
      - indel_length_distribution.csv
      - ontarget_mutation_distribution.csv (GE-BE, GE+Protein-BE only)
      - ontarget_edit_cooccurrence_with_protein.csv
      - translocations.csv (GE-KO, GE+Protein-KO only)
    - HTML Files
      - top_edit_sites_per_guide.html
      - top10_ontarget_variant_zygosity.html (GE-KO, GE+Protein-KO only)
      - ontarget_edit_cooccurrence.html (GE-KO, GE+Protein-KO only)
      - ontarget_editing_zygosity.html
      - ontarget_allele_status.html
      - indel_start_locations.html (GE-KO, GE+Protein-KO only)
      - indel_length_distribution.html
      - ontarget_mutation_distribution.html (GE-BE, GE+Protein-BE only)
      - translocations.html (GE-KO, GE+Protein-KO only)
    - Panel Files
      - primers_window.tab
      - primers.tab
      - amplicon_groups.tsv
      - panel_file.modified_target_groups.csv
      - panel_file.modified_submitted

For more information about Gene Editing output files, refer to this article.

To download any output file, click the download icon to the left of the File Name.

Note: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again.

Genome Editing Report Overview

To download the run Run Report, go to the Output Files tab and download the *.report.html file. Plots and tables in the report are interactive.

Summary

The Summary page displays the following information:

# cells
Mean reads/cell/amplicon
Panel uniformity
% DNA reads mapped to target
Sample information
- Run ID
- Analyte
- DNA panel name
- DNA panel size
- Reference genome
- Pipeline version
- Data analyzed
Sequencing (DNA)
- # total read pairs
- Read quality (Q30)
- % read pairs trimmed
- % read pairs with valid barcodes
Mapping
- % reads mapped to genome
- % reads mapped to target
Cell calling
- # cells
- Panel uniformity
- Mean reads/cell/amplicon
Panel uniformity summary: a table listing information for every amplicon in the panel: Amplicon name, Median normalized counts, Mean reads, Low performers
Summary of editing: a table listing the summary of editing across all targets: Group, Category, Total Alleles, # edited alleles, % edited alleles, Total cells, # edited cells, % edited cells
Distribution of on-target alleles: the number and percentage of targeted alleles that are edited and unedited (wildtype).

Advanced

The Advanced page displays the following information for single sample reports:

Top on-target variants: a table listing the top on-target variants: Target, Variant, Modification, Total alleles, # alleles, % alleles, Total cells, # cells, % cells, % heterozygous, % homozygous
Top off-target variants: a table listing the top off-target variants: Target, Variant, Modification, Total alleles, # alleles, % alleles, Total cells, # cells, % cells, % heterozygous, % homozygous
On-target INDEL lengths: The percentage and number of alleles containing particular insertion or deletion (indel) lengths for an on-target edit. Note: SNVs are specifically not shown in this graph.
Zygosity of on-target edits: The number and percentage of cells with wildtype (WT), mono-allelic, or bi-allelic edits (assuming cells are diploid). (KO report only)
On-target indel start position: The distribution of start positions of insertion or deletions (indels) within the on-target activity window. (KO report only)
Zygosity of top 10 on-target variants: The 10 most frequent variants and the percentage of cells that contain the variant on both alleles (homozygous), one allele (heterozygous), and not at all (wildtype, WT).
Top edit combinations per group: The most frequent combinations of co-occurring on-target(s) and/or off-target(s) within each Group. (KO report only)
Top 5 edit sites per group: The percentage of cells containing the top edit sites for each Group (up to 5).
Co-occurrence of on-target edits: The percentage of cells with the most frequent combinations of multiple on-target edits. (KO report only)
On-target variant distribution: The distribution of single nucleotide variants (SNVs) in the on-target activity window. (BE report only)
Top 10 Translocations: The percentage of cells containing the most frequent predicted translocations (up to 10). (KO report only)
Protein (Optional): The number of cells with different combinations of edits and cell-surface protein expression. (DNA+Protein runs only)
CNV clones (Optional): The number of cells with different gwCNV events.(KO report with CNV inputs)
CNV (Optional): The percentage of cells with different gwCNV events for different combinations of edits. (KO report with CNV inputs)

The Advanced page displays the following information for multi-sample reports:

On-target editing status: The distribution of different editing zygosity: WT (red), mono-allelic edit (purple), bi-allelic edit (green) for 1 on-target across all samples (up to 5).
Top 5 edit sites per group: The percentage of cells with the most frequently edited target sites (up to 5) for each Group.
On-target variant distribution: The distribution of single nucleotide variants (SNVs) in the on-target activity window. (BE report only)
On-target editing co-occurrence: The percentage of cells with the most frequent combinations of on-target edits, compared across 2 samples. (KO report only)
Top 5 edit combinations per group: The percentage of cells containing the top edit sites for each Group (up to 5), compared across all samples. (KO report only)
Top 5 Translocations: The most frequent translocations (up to 5) across five samples. The first two columns (Target 1 and Target 2) specify the two targets that comprise each translocation. The remaining columns denote up to five samples. (KO report only)

Definitions

The Definitions page contains a glossary of key words used in the report, an overview of the default variant filters and a description of every table and plot contained in the report.

Share this article:

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Table of Contents