Step 1 : Preflight checks (QC)

  • Updated

Tapestri DNA Pipeline starts with running a quality check (QC) module, which checks the validity of the input data. This QC module ensures that the FASTQ files used are valid and have no issues that might cause DNA Pipeline to fail. It also detects the chemistry and sets it as the run chemistry for the downstream analysis of the DNA Pipeline. If all files pass QC, DNA Pipeline moves on to the next steps.

For demultiplexing runs it also checks the validity of the sample variants file and the demultiplexing protein panel to prevent failures at a later stage.

FASTQ files generated from the Illumina sequencer are checked for quality using the fastp module. The file format, as well as sequence quality, is verified before running the pipeline.

Failure of QC

Under certain conditions, the QC may fail, and in every event, the run status displays a suitable message. The following table summarizes the errors along with the run status and reason for the error.

Issue

Run Status

Reason

QC failed due to FASTQ file corruption

QC Failed

  • FASTQ file is not a valid .gz archive.

A sample is oversequenced

Oversequenced Sample

  • v2 and v3 check - The expected coverage for the FASTQ file containing the reads is more than 320x, where
    expected_coverage = (read_count * tube_count * lane_count) / (expected_num_of_cells = 20000 * amplicon_count)
  • v3.4 check - The expected coverage for the FASTQ file containing the read pairs is more than 160x, where
    expected_coverage = (readpair_count * lane_count) / (expected_num_of_cells = 30000 * amplicon_count)

QC failed due to high percent of Ns at a position

QC Failed (v2)

QC Warning (v3)

  • There are more than 20% Ns at a particular read position across all reads.

Barcode Version cannot be determined (Only v2/v3)

QC Failed

  • Fixed part counts for v1 and v2/v3 barcode versions are insufficient to reliably determine the version of the sample (v2 and v3 chemistry use the same barcode structure). 

Panel zip file is not correct

QC Failed
  • The panel file does not have the correct structure. See more details about the zip folder structure.
R1/R2 read mismatch (Only v3)

 

QC Failed
  • The R1 and R2 contain different number of reads.

Amplicon Overlap (Only v3)

QC Failed
  • All chromosomes in the panel are not available in the genome fasta file.

Protein panel issues

QC Failed
  • Mandatory columns - Name, ID, Sequence - are missing

  • Sequence contains non-ATGC characters
  • Duplicate barcode sequence
  • Invalid column separator
  • non UTF-8 file encoding

Sample Variants File issues

QC Failed
  • Mandatory columns - chromosome, position, ref_allele, alt_allele, genotype, type, sample_id - are missing
  • Using an invalid separator (; or tab instead of ,)
  • ref_allele or alt_allele contains non-ATGC characters
  • Less than 5 differentiating variants between 2 samples
  • Less than 5 variants provided
  • One or more variants to not overlap with the panel
  • non UTF-8 file encoding 
  • Less than two sample_id's 
  • Invalid sample name
    • At least 3 characters
    • Alphanumeric
    • Should not contain any special characters other than "-" or "_"

Demultiplexing protein panel issue

QC Failed
  • Mandatory columns - Name, ID, Sequence, Sample_ID - are missing
  • Using an invalid separator (; or tab instead of ,)
  • Header and lines contain different number of elements
  • Sequence contains non-ATGC characters
  • Duplicate barcode sequence
  • non UTF-8 file encoding
  • There are less than two sample ids present
  • Invalid sample name
    • At least 3 characters
    • Alphanumeric
    • Should not contain any special characters other than "-"

Based on the reasons above, investigate the reason for the run failure or contact support@missionbio.com for additional help.

Share this article:

Was this article helpful?

2 out of 2 found this helpful

Have more questions? Submit a request