Pipeline output files overview

  • Updated

The following output files are created based on the Pipeline.

DNA  and DNA + Protein

  • all.barcode.distribution.merged.tsv (DNA, DNA + Protein)
    This file reports the number of forward reads assigned to each amplicon for each cell found in the run. For more information, see this file.

  • allele.drop.out.report.txt (DNA, DNA + Protein)
    This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells.

    Refer to this article to learn more about ADO calculation.

  • barcode.cell.coverage.tsv (DNA, DNA + Protein)
    This file lists each amplicon and the number of mean reads per cell. For each amplicon, it also lists whether the amplicon passed the threshold. The threshold calculation is the mean of the total mean reads, e.g., the mean of column B * 0.2. If the mean reads for an amplicon is above this number, it passed the threshold and has a value of TRUE. If it is below this number, the value is FALSE.

  • barcode.cell.distribution.merged.tsv (DNA, DNA + Protein)
    A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this file.

  • cells.bam (DNA, DNA + Protein)
    This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the barcode. If your run is from Tapestri v1 using v1 chemistry, you will have multiple cells.bam files because v1 was index-specific.

    A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV

  • cells.bam.bai (DNA, DNA + Protein)
    This is the .bam index file used by the IGV tool to view the alignments.

  • cells.loom
    This file is an input file for Tapestri Insights v2. LOOM is an efficient file format for very large omics datasets, consisting of a main matrix, optional layers, and a variable number of row and column annotations. For more information, see this file.

  • cells.vcf.gz (DNA, DNA + Protein)
    This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells. 

  • dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
    This is a new multi-omics file format that contains the data for all the analytes in a single file. For more details, read this article.

  • fastp.html (DNA, DNA + Protein)
    This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality – Q20 and - Q30 base count and base content of the reads.

  • fastp.json (DNA, DNA + Protein)
    This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads.

  • flagstat.txt
    This text file shows the mapping statistics from the reference genome alignment step, listing the number of QC-passed reads and QC-failed reads for items like read1, read2, properly paired, and mapped. If your run is from Tapestri v1 using v1 chemistry, you will have multiple flagstat.txt files because v1 was index-specific. For more information on the flagstat format, see this description.

  • lane1-qc.json (DNA, DNA + Protein)
    This file provides the results of the QC step in the DNA Pipeline run, including errors, warnings, and passing results.

  • mapped.bam (DNA, DNA + Protein)
    This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but before cell calling. It maps all the barcodes in the run. If your run is from Tapestri v1 using v1 chemistry, you will have multiple mapped.bam files because v1 was index-specific.

    A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV.

  • metrics.json (DNA only)
    This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics.

  • missionbio_gatk.err (DNA, DNA + Protein)
    This file provides the detailed log information for the GATK step of the analysis pipeline.

  • missionbio_gatk.out (DNA, DNA + Protein)
    This file provides the log information for the GATK step of the analysis pipeline.

  • missionbio_part1_tube1.err (DNA, DNA + Protein)
    This file provides the detailed log information for part one of the analysis pipeline, such as barcode extraction and cell finding.

  • missionbio_part1_tube1.out (DNA, DNA + Protein)
    This file provides the log information for part one of the analysis pipeline, such as barcode extraction and cell finding.

  • missionbio_part2.err (DNA, DNA + Protein)
    This file provides the detailed log information for part two of the analysis pipeline, such as merging the genomic-VCFs (gVCFs) for all cells, ADO calculation, and .loom/.h5 file creation.

  • missionbio_part2.out (DNA, DNA + Protein)
    This file provides the log information for part two of the analysis pipeline, such as the ADO calculation.

  • missionbio_qc_lane1_tube1.err (DNA, DNA + Protein)
    This file provides all the detailed information for the FASTQ quality analysis step of the pipeline run.

  • missionbio_qc_lane1_tube1.out (DNA, DNA + Protein)
    This file provides all the information for the FASTQ quality analysis step of the pipeline run.

  • part1.tube1.progress.csv (DNA, DNA + Protein)
    This file shows the progress through part one of the run and lists all the steps that were successfully executed, such as barcode extraction.

  • part2.progress.csv (DNA, DNA + Protein)
    This file shows the progress through part two of the run and lists all the steps that were successfully executed, such as .h5 file generation.

  • protein.log (DNA + Protein)
    This file provides all the information for the QC step in the Protein Pipeline run to help debug failures.

  • qc.json (DNA + Protein)
    This file contains the results of the QC step in the Protein Pipeline run, including errors, warnings, and passing results.
  • report.html (DNA, DNA + Protein)
    This new .html report is generated for each run. It summarizes the run details and all the information that was part of the other output files in Tapestri Pipeline v1. For more details, see this section of the Pipeline User Guide.

  • report.json (DNA + Protein only)
    This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report contents.

Merge Samples

The Merge Samples Pipeline generates a single merged .h5 file that contains different assays, depending on the analytes present in the input runs. 

  1. Merge 2 or more DNA-only runs – Merged .h5 file with only the DNA-related assays – dna_variants and dna_read_counts.
  2. Merge 2 or more DNA + Protein runs – Merged .h5 file with DNA and protein assays – dna_variants, dna_read_counts, and protein_read_counts.
  3. Merge DNA and DNA + Protein runs – Merged .h5 file with only the DNA-related assays – dna_variants and dna_read_counts.

 

Share this article:

Was this article helpful?

9 out of 9 found this helpful

Have more questions? Submit a request