Clonal Insights Software Input Files

  • Updated

The Clonal Insights Software (CIS) requires the following input files to run:

  1. FASTQ files
  2. Panel files
    1. [required] *.bed
    2. [required] *.amplicons
    3. [optional] *.per-variant-background-error.csv (only for certain catalog DNA panels)
  3. Reference genome files
  4. [optional] Somatic Variants (whitelist/blacklist) CSV file
  5. [required only for multiplexed runs] Sample Variants (germline) CSV file

FASTQ files

Input FASTQ files are one or more pairs of forward and reverse FASTQ files (R1/R2). These files should be compressed (.gz). DNA FASTQs are always required to run this Pipeline. Protein FASTQs are only required for the CIS DNA + Protein Pipeline. 

 

Panel files

The Clonal Insights Pipeline can be run with 2 types of DNA panels - standard DNA panels and Clonal Insights panels. 

  • Clonal Insights Panel - The Clonal Insights panel is available as a catalog panel with the three files listed below zipped together. The third file (the background error rate file) is available only for specific catalog panels, and cannot be used with other panels.
  • Standard DNA panel - For standard DNA panels,only first two files listed below need to be zipped together and uploaded to Tapestri Pipeline as a ‘DNA panel’ file type. 

For further information on how to upload these files please refer to Tapestri Pipeline CIS User Guide. 

The first 2 files ( *.bed, and *.amplicons) are standard output from Tapestri Designer or the White Glove panel design pipeline, and can be used without modification. The *.per-variant-background-error.csv file is unique to the Clonal Insights panel type and helps improve somatic variant calling.

  1. *.bed

Standard output for all panel files. Additional details here.

  1. *.amplicons

Standard output for all panel files. Additional details here.

  1.  *.per-variant-background-error.csv (only for certain catalog DNA panels)

This is a 6 column CSV file with the following columns: index, Mean error, Error count, Mean error count, beta-binomial_a, beta-binomial_b. This file can only be generated for wet-lab tested and verified DNA panels, and aids in the detection of rare variants.

Example:

Reference genome files

The reference genome used for the pipeline must match the reference genome the panel was created with. If a custom reference genome was used, please upload the .fa.zip file of the genome to your Tapestri Pipeline account, following the instructions provided here

 

Somatic Variants (whitelist/blacklist) CSV file

This is a 7-column CSV file with the following columns: chromosome, position, ref_allele, alt_allele, type, sample_id and genotype. This is an optional file that is used during the somatic clone detection step to include or exclude variants. The type column should specify if the variant is ‘whitelist’ or ‘blacklist’. Unlike the Sample Variants file, this file does not require genotype information. 

Note: For Standard (no demultiplexing) Clonal Insights DNA, DNA+Protein or Reprocess runs, the sample_id column is ignored.

Example:

Sample Variants (germline) CSV file

This is a 7-column CSV file with the following columns: chromosome, position, ref_allele, alt_allele, genotype, type, and sample_id. This file is used to demultiplex the samples. The type column will specify that the variant is germline. This file is only needed for runs that were multiplexed. This file can be user-created or the output of the Merge Bulk Runs pipeline. The file format is explained in depth here.

 

Example:

 

Share this article:

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request