Pipeline output files overview

  • Updated

The following output files are created based on the Pipeline.

DNA  and DNA + Protein

  • all.barcode.distribution.tsv.zip(V3)/all.barcode.distribution.merged.tsv(V2) (DNA, DNA + Protein)
    This file reports the number of forward reads assigned to each amplicon for each barcode found in the run. GE output file SAMPLE_NAME_all_barcode_distribution.tsv is similar to the V2 pipeline's all barcode distribution file. 
  • allele.drop.out.report.txt (DNA, DNA + Protein)
    This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells.

    Refer to this article to learn more about ADO calculation.

  • barcode.cell.coverage.tsv (DNA, DNA + Protein) - Only V2
    This file lists each amplicon and the number of mean reads per cell. For each amplicon, it also lists whether the amplicon passed the threshold. The threshold calculation is the mean of the total mean reads, e.g., the mean of column B * 0.2. If the mean reads for an amplicon is above this number, it passed the threshold and has a value of TRUE. If it is below this number, the value is FALSE.

  • cell.barcode.distribution.tsv.zip(V3)/barcode.cell.distribution.merged.tsv(V2) (DNA, DNA + Protein)
    A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this file.

  • cells.bam (DNA, DNA + Protein)
    This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the barcode. 

    A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV

  • cells.bam.csi(V3)/cells.bam.bai(V2) (DNA, DNA + Protein)
    This is the .bam index file used by the IGV tool to view the alignments.

  • cells.loom
    This file is an input file for Tapestri Insights v2. LOOM is an efficient file format for very large omics datasets, consisting of a main matrix, optional layers, and a variable number of row and column annotations. For more information, see this file.

  • cells.vcf.gz (DNA, DNA + Protein)
    This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells. 

  • dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
    This is a multi-omics file format that contains the data for all the analytes in a single file. For more details, read this article.

  • fastp.html (DNA, DNA + Protein)
    This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality – Q20 and Q30 base count and base content of the reads.

  • fastp.json (DNA, DNA + Protein)
    This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads.

  • flagstat.txt - only V2
    This text file shows the mapping statistics from the reference genome alignment step, listing the number of QC-passed reads and QC-failed reads for items like read1, read2, properly paired, and mapped. For more information on the flagstat format, see this description. In V3 Pipeline the metrics are added to the metadata in the h5 file.

  • DNA qc.json (DNA, DNA + Protein)
    This file provides the results of the QC step in the DNA Pipeline run, including errors, warnings, and passing results.

  • mapped.bam (DNA, DNA + Protein)
    This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but before cell calling. It maps all the barcodes in the run. 

    A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV.

  • metrics.json (DNA only)
    This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics.

  • missionbio_gatk.err (DNA, DNA + Protein) - Only V2
    This file provides the detailed log information for the GATK step of the analysis pipeline.

  • missionbio_gatk.out (DNA, DNA + Protein) - Only V2
    This file provides the log information for the GATK step of the analysis pipeline.

  • missionbio_part1_tube1.err (DNA, DNA + Protein) - Only V2
    This file provides the detailed log information for part one of the analysis pipeline, such as barcode extraction and cell finding.

  • missionbio_part1_tube1.out (DNA, DNA + Protein) - Only V2
    This file provides the log information for part one of the analysis pipeline, such as barcode extraction and cell finding.

  • missionbio_part2.err (DNA, DNA + Protein) - Only V2
    This file provides the detailed log information for part two of the analysis pipeline, such as merging the genomic-VCFs (gVCFs) for all cells, ADO calculation, and .loom/.h5 file creation.

  • missionbio_part2.out (DNA, DNA + Protein) - Only V2
    This file provides the log information for part two of the analysis pipeline, such as the ADO calculation.

  • missionbio_qc_lane1_tube1.err (DNA, DNA + Protein)
    This file provides all the detailed information for the FASTQ quality analysis step of the pipeline run.

  • missionbio_qc_lane1_tube1.out (DNA, DNA + Protein)
    This file provides all the information for the FASTQ quality analysis step of the pipeline run.

  • part1.tube1.progress.csv (DNA, DNA + Protein) - Only V2
    This file shows the progress through part one of the run and lists all the steps that were successfully executed, such as barcode extraction.

  • part2.progress.csv (DNA, DNA + Protein) - Only V2
    This file shows the progress through part two of the run and lists all the steps that were successfully executed, such as .h5 file generation.

  • protein.log (DNA + Protein)
    This file provides all the information for the QC step in the Protein Pipeline run to help debug failures.

  • Protein qc.json (DNA + Protein)
    This file contains the results of the QC step in the Protein Pipeline run, including errors, warnings, and passing results.
  • report.html (DNA, DNA + Protein)
    This new .html report is generated for each run. It summarizes the run details in the form of various metrics and plots that can be used to understand the performance. For more details, see this section of the Pipeline User Guide.

  • report.json (DNA + Protein only)
    This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report contents.
  • tapestri_run_output.txt - Only V3
    This file provides the detailed log information for all the steps of the analysis pipeline. It should be used as a starting point to investigate any run failure.

Merge Samples

The Merge Samples Pipeline generates a single merged .h5 file that contains different assays, depending on the analytes present in the input runs. 

  1. Merge 2 or more DNA-only runs – Merged .h5 file with only the DNA-related assays – dna_variants and dna_read_counts.
  2. Merge 2 or more DNA + Protein runs – Merged .h5 file with DNA and protein assays – dna_variants, dna_read_counts, and protein_read_counts.
  3. Merge DNA and DNA + Protein runs – Merged .h5 file with only the DNA-related assays – dna_variants and dna_read_counts.

 

Share this article:

Was this article helpful?

10 out of 10 found this helpful

Have more questions? Submit a request