Mission Bio’s Tapestri Pipeline Genome Editing Software allows customers to process single-cell gene editing DNA and DNA+Protein sequencing data generated on the Tapestri Platform.
Table of Contents
- Setting up Tapestri Pipeline Account
- Single Cell Genome Editing Pipeline
- Genome Editing Inputs
- Report Types
- Starting a Genome Editing Run
- Genome Editing Output Files
- Genome Editing Report Overview
Setting up Tapestri Pipeline Account
Refer to the Tapestri Pipeline User Guide to set up an account and access to Tapestri Pipeline.
Single Cell Genome Editing Pipeline
Mission Bio’s single cell Genome Editing (GE) analysis pipeline is a complete software solution for the detection of single cell edits and can help determine the rate of on- and off-target events and the rate of co-occurrence events. FASTQ files generated from Tapestri sequencing libraries are provided as input, and the pipeline generates reports of top on- and off-target edits, top edit combinations and the zygosity of editing events. The pipeline is compatible with both DNA only, and DNA+Protein samples.
There are three types of Genome Editing analysis:
- Single sample analysis
- Multi-sample analysis
- Multiplexed sample analysis
Single-sample analysis
Single sample Genome Editing analysis works on the FASTQ files and provides a genome editing report for the single sample. This pipeline can be run with DNA only FASTQS, or DNA and Protein FASTQS. The resulting report will summarize on- and off- target editing activity for the sample.
Multi-sample analysis
Multi-sample Genome Editing analysis works on the h5 files produced from single sample analysis pipelines. The resulting report will summarize on-target editing activity across up to five samples.
Multiplexed Sample analysis
With the release of sample multiplexing, customers can now multiplex up to 3 samples on one Genome Editing Tapestri run using antibody hashing with seamless demultiplexing using the Genome Editing Pipeline. Antibody hashing is currently compatible with the Tapestri GE DNA assay only. Genome Editing Pipeline v1.1.1 supports antibody hashing using the GE DNA+Protein pipeline.
The details about these GE pipelines are presented in the following sections.
Genome Editing Inputs
FASTQ Files
Input FASTQ files are one pair (or multiple pairs if sequenced on multiple sequencing lanes) of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are always required to run this Pipeline. Protein FASTQs are only required for the GE DNA+Protein Pipeline.
Panel Files
To provide panel and target information, the following file name extensions are needed to run the pipeline:
- *.amplicons
- *.bed
- *.designSummary.tab
- *.submitted
- *.target_groups.csv
These five files need to be zipped together and uploaded to Tapestri Pipeline as a ‘Genome Editing Panel’ file type. For more information about these input files, refer to this article.
Uploading GE Panel Files
Create the panel files based on the details mentioned above, and then .zip the files together prior to uploading to Tapestri Pipeline. This file needs to be uploaded to Tapestri Pipeline before it can be used in a GE run. To upload the GE Panel Files, follow the instructions below:
- Click the Add Files button.
- Select the option Panel from the left panel.
- In the dropdown select Genome Editing Panel.
- Choose either Upload from Local Computer or Import from Amazon S3 based on where the Panel files are saved.
- Choose the files to add and click Upload.
- Once the upload completes, the files can be seen in the Panels tab on the Files table.
Note: this .zip cannot be created using the Finder function on Mac. This function adds an additional folder to the zip that will cause the panel file to fail uploading.
Uploading Protein Panel Files
This file is only necessary for DNA+Protein runs. This file is a 3-column .csv file, for more information about the format of this file, please refer to this article. To upload a protein panel file, follow the instructions below:
- Click the Add Files button.
- Select the option Panel from the left panel.
- In the dropdown select Protein Panel.
- Choose the files to add from your Local Computer and click Upload.
- Once the upload completes, the files can be seen in the Panels tab on the Files table.
Reference Genome
We recommend that you use one of the Mission Bio-provided reference genomes. The reference genome used for the pipeline must match the reference genome the panel was created with. If a custom reference genome was used, please upload the .fa.zip file of the genome to your Tapestri Pipeline account, following the instructions provided here.
Report Types
Base Editing
The Base Editing (BE) Report should be selected for experiments in which a base editor was used to make single base substitutions in cells. The report shows SNVs within the on-target activity window for one or multiple on-targets and shows SNVs and indels for predicted off-target edits. The report does not display indel information in the on-target activity window(s), but this information is accessible in the H5 file.
Knockout
The Knockout (KO) Report should be selected for experiments in which a genome editor (e.g., CRISPR/Cas9) was used to make edits to cells and the intended edit type is indels (i.e, NHEJ repair pathway).The report shows indels observed at one or multiple on-target locations as well as for predicted off-target locations. The report does not display SNV information in the on-target activity window(s), but this information is accessible in the H5 file.
In cases where both Report Types are desired, please contact support (support@missionbio.com)
Starting a Genome Editing Run
Tapestri Pipeline application allows you to start three types of Genome Editing pipelines:
- GE DNA only
- GE DNA+Protein
- GE Multi-sample
GE DNA only and GE DNA+Protein
To start a GE DNA only or GE DNA+Protein pipeline run, follow the steps below:
- Click the Start Run button.
- Add the run name, optionally, add a description about the run.
- Select the Pipeline: GE DNA only or GE DNA+Protein based on the available input files.
- Select the reference genome the panel was created with.
- Select the Report Type: KO or BE.
- Select the panel files.
- If you are doing DNA+Protein, you will need to select your protein panel files, as well as your GE panel files.
- Select the FASTQ files and assign them to correct lanes corresponding to your Tapestri experiment. See Lane assignment article for details.
- If you are doing a DNA+Protein run, you will need to select both DNA and Protein FASTQ files.
- Preview the run inputs and submit the run.
GE Multi-sample
This pipeline is used to combine output from multiple samples. If you want to compare editing activity across up to five runs (e.g., negative control and replicates). First each replicate/run needs to be processed on the GE pipeline individually, and then the GE Multi-sample pipeline can be used to compare multiple runs. To start a GE Multi-sample pipeline run, follow the steps below:
- Click the Start Run button.
- Add the run name, and optionally add a description about the run.
- Select the Pipeline: GE Multi-sample.
- Select the Report Type: KO or BE.
- Select the panel files.
- Select the h5 files from previous GE runs.
- Preview the run inputs and submit the run.
GE Sample Demultiplexing
Demultiplexing workflow helps to classify the cells into individual samples based on antibody expression. To run this pipeline follow the same steps as GE DNA+Protein single sample run but with an updated Protein panel. The protein panel should provide the mapping between the sample and the hashing antibody that it expresses. This can be done by adding a new column called “Sample_ID” to the protein panel and defining the hashing antibodies with the sample names in it. For example:
Sequence,Name,ID,Sample_ID GTCAACTCTTTAGCG,Hashtag-1,Hashtag-1,Sample_1 TGATGGCCTATTGGG,Hashtag-2,Hashtag-2,Sample_2 TTCCGCCTCTCTTTG,Hashtag-3,Hashtag-3,Sample_3 |
The main difference between this demultiplexing run and a single sample run is that the demultiplexing run generates multiple genome editing reports whereas the single sample creates a single report. As genome editing report for the multiplexed data is not relevant the Report tab on the run details page is empty and the individual sample reports can be downloaded from the Output Files tab.
Genome Editing Output Files
The Gene Editing Pipeline outputs the following files:
- The DNA only KO and BE reports, and the DNA+Protein KO and BE reports produce the following outputs:
- Run level Files
- SAMPLE_NAME_raw_barcodes.txt
- SAMPLE_NAME_cells.txt
- SAMPLE_NAME.ge.h5 (GE-KO, GE-BE) / SAMPLE_NAME.ge_protein.h5 (GE+Protein-KO, GE+Protein-BE)
- SAMPLE_NAME_cell_barcode_distribution.tsv
- SAMPLE_NAME_all_barcode_distribution.tsv
- SAMPLE_NAME_paired_reads.tsv
- SAMPLE_NAME_primers.bam
- SAMPLE_NAME_primers.bai
- SAMPLE_NAME.aligned.bam
- SAMPLE_NAME.pipeline_metrics.json
- Reporting Files
- SAMPLE_NAME.report.html
- CSV Files
- top_on_target_alleles.csv
- top_off_target_alleles.csv
- top_edit_sites_per_guide.csv
- top_edit_combinations_per_guide.csv (GE-KO, GE+Protein-KO only)
- top10_ontarget_variant_zygosity.csv (GE-KO, GE+Protein-KO only)
- summary_of_editing.csv
- panel_uniformity.csv
- ontarget_edit_cooccurrence.csv (GE-KO, GE+Protein-KO only)
- ontarget_editing_zygosity.csv
- ontarget_allele_status.csv
- indel_start_locations.csv (GE-KO, GE+Protein-KO only)
- indel_length_distribution.csv
- ontarget_mutation_distribution.csv (GE-BE, GE+Protein-BE only)
- ontarget_edit_cooccurrence_with_protein.csv
- translocations.csv (GE-KO, GE+Protein-KO only)
- HTML Files
- top_on_target_alleles.html
- top_off_target_alleles.html
- top_edit_sites_per_guide.html
- top_edit_combinations_per_guide.html (GE-KO, GE+Protein-KO only)
- top10_ontarget_variant_zygosity.html (GE-KO, GE+Protein-KO only)
- summary_of_editing.html
- panel_uniformity.html
- ontarget_edit_cooccurrence.html (GE-KO, GE+Protein-KO only)
- ontarget_editing_zygosity.html
- ontarget_allele_status.html
- indel_start_locations.html (GE-KO, GE+Protein-KO only)
- indel_length_distribution.html
- ontarget_mutation_distribution.html (GE-BE, GE+Protein-BE only)
- translocations.html (GE-KO, GE+Protein-KO only)
- Panel Files
- primers_window.tab
- primers.tab
- amplicon_groups.tab
- panel_file.modified_target_groups.csv
- panel_file.modified_submitted
- Run level Files
- The Multi-sample KO and BE reports produce the following outputs:
- multi_sample.report.html
- top_edit_sites.csv
- top_edit_combinations.csv (KO only)
- ontarget_edit_zygosity.csv
- ontarget_edit_cooccurrence.csv
- ontarget_mutation_distribution.csv (BE only)
- translocations.csv (KO only)
- Demultiplexing KO and BE runs produce the following outputs:
- Run level Files
- RUN_NAME_raw_barcodes.txt
- RUN_NAME_cells.txt
- RUN_NAME.ge.h5 / RUN_NAME.ge_protein.h5
- RUN_NAME_cell_barcode_distribution.tsv
- RUN_NAME_all_barcode_distribution.tsv
- RUN_NAME_primers.bam
- RUN_NAME_primers.bai
- RUN_NAME.aligned.bam
- RUN_NAME.pipeline_metrics.json
- Sample level files (one set per sample)
- SAMPLE_NAME_primers.bam
- SAMPLE_NAME_primers.bai
- SAMPLE_NAME.ge.h5 / SAMPLE_NAME.ge_protein.h5
- Reporting Files
- SAMPLE_NAME.report.html
- CSV Files
- top_on_target_alleles.csv
- top_off_target_alleles.csv
- top_edit_sites_per_guide.csv
- top_edit_combinations_per_guide.csv (GE-KO, GE+Protein-KO only)
- top10_ontarget_variant_zygosity.csv (GE-KO, GE+Protein-KO only)
- summary_of_editing.csv
- panel_uniformity.csv
- ontarget_edit_cooccurrence.csv (GE-KO, GE+Protein-KO only)
- ontarget_editing_zygosity.csv
- ontarget_allele_status.csv
- indel_start_locations.csv (GE-KO, GE+Protein-KO only)
- indel_length_distribution.csv
- ontarget_mutation_distribution.csv (GE-BE, GE+Protein-BE only)
- ontarget_edit_cooccurrence_with_protein.csv
- translocations.csv (GE-KO, GE+Protein-KO only)
- HTML Files
- top_on_target_alleles.html
- top_off_target_alleles.html
- top_edit_sites_per_guide.html
- top_edit_combinations_per_guide.html (GE-KO, GE+Protein-KO only)
- top10_ontarget_variant_zygosity.html (GE-KO, GE+Protein-KO only)
- summary_of_editing.html
- panel_uniformity.html
- ontarget_edit_cooccurrence.html (GE-KO, GE+Protein-KO only)
- ontarget_editing_zygosity.html
- ontarget_allele_status.html
- indel_start_locations.html (GE-KO, GE+Protein-KO only)
- indel_length_distribution.html
- ontarget_mutation_distribution.html (GE-BE, GE+Protein-BE only)
- translocations.html (GE-KO, GE+Protein-KO only)
- Panel Files
- primers_window.tab
- primers.tab
- amplicon_groups.tab
- panel_file.modified_target_groups.csv
- panel_file.modified_submitted
- Run level Files
For more information about Gene Editing output files, refer to this article.
To download any output file, click the download icon to the left of the File Name.
Note: if the file does not download, see if you have an ad popup blocker running. If so, disable it, and download the file again.
Genome Editing Report Overview
To download the run Run Report, go to the Output Files tab and download the *.report.html file. Plots and tables in the report are interactive.
Summary
The Summary page displays the following information:
- # cells
- Mean reads/cell/amplicon
- Panel uniformity
- % DNA reads mapped to target
- Sample information
- Run ID
- Analyte
- DNA panel name
- DNA panel size
- Reference genome
- Pipeline version
- Data analyzed
- Sequencing (DNA)
- # total read pairs
- Read quality (Q30)
- % read pairs trimmed
- % read pairs with valid barcodes
- Mapping
- % reads mapped to genome
- % reads mapped to target
- Cell calling
- # cells
- Panel uniformity
- Mean reads/cell/amplicon
- Panel uniformity summary: a table listing information for every amplicon in the panel: Amplicon name, Median normalized counts, Mean reads, Low performers
- Summary of editing: a table listing the summary of editing across all targets: Group, Category, Total Alleles, # edited alleles, % edited alleles, Total cells, # edited cells, % edited cells
- Distribution of on-target alleles: the number and percentage of targeted alleles that are edited and unedited (wildtype).
Advanced
The Advanced page displays the following information for single sample reports:
- Top on-target variants: a table listing the top on-target variants: Target, Variant, Modification, Total alleles, # alleles, % alleles, Total cells, # cells, % cells, % heterozygous, % homozygous
- Top off-target variants: a table listing the top off-target variants: Target, Variant, Modification, Total alleles, # alleles, % alleles, Total cells, # cells, % cells, % heterozygous, % homozygous
- On-target INDEL lengths: The percentage and number of alleles containing particular insertion or deletion (indel) lengths for an on-target edit. Note: SNVs are specifically not shown in this graph.
- Zygosity of on-target edits: The number and percentage of cells with wildtype (WT), mono-allelic, or bi-allelic edits (assuming cells are diploid). (KO report only)
- On-target indel start position: The distribution of start positions of insertion or deletions (indels) within the on-target activity window. (KO report only)
- Zygosity of top 10 on-target variants: The 10 most frequent variants and the percentage of cells that contain the variant on both alleles (homozygous), one allele (heterozygous), and not at all (wildtype, WT).
- Top edit combinations per group: The most frequent combinations of co-occurring on-target(s) and/or off-target(s) within each Group. (KO report only)
- Top 5 edit sites per group: The percentage of cells containing the top edit sites for each Group (up to 5).
- Co-occurrence of on-target edits: The percentage of cells with the most frequent combinations of multiple on-target edits. (KO report only)
- On-target variant distribution: The distribution of single nucleotide variants (SNVs) in the on-target activity window. (BE report only)
- Top 10 Translocations: The percentage of cells containing the most frequent predicted translocations (up to 10). (KO report only)
- Protein (Optional): The number of cells with different combinations of edits and cell-surface protein expression. (DNA+Protein runs only)
The Advanced page displays the following information for multi-sample reports:
- On-target editing status: The distribution of different editing zygosity: WT (red), mono-allelic edit (purple), bi-allelic edit (green) for 1 on-target across all samples (up to 5).
- Top 5 edit sites per group: The percentage of cells with the most frequently edited target sites (up to 5) for each Group.
- On-target variant distribution: The distribution of single nucleotide variants (SNVs) in the on-target activity window. (BE report only)
- On-target editing co-occurrence: The percentage of cells with the most frequent combinations of on-target edits, compared across 2 samples. (KO report only)
- Top 5 edit combinations per group: The percentage of cells containing the top edit sites for each Group (up to 5), compared across all samples. (KO report only)
- Top 5 Translocations: The most frequent translocations (up to 5) across five samples. The first two columns (Target 1 and Target 2) specify the two targets that comprise each translocation. The remaining columns denote up to five samples. (KO report only)
Definitions
The Definitions page contains a glossary of key words used in the report, an overview of the default variant filters and a description of every table and plot contained in the report.