The following output files are created based on the Pipeline.
DNA and DNA + Protein pipelines
-
all.barcode.distribution.tsv.zip/all.barcode.distribution.merged.tsv (v2) (DNA, DNA + Protein)
This file reports the number of forward reads assigned to each amplicon for each barcode found in the run. GE output file SAMPLE_NAME_all_barcode_distribution.tsv is similar to the v2 pipeline's all barcode distribution file. -
allele.drop.out.report.txt (DNA, DNA + Protein)
This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells.
Refer to this article to learn more about ADO calculation.
-
barcode.cell.coverage.tsv (DNA, DNA + Protein) - Only v2 pipeline
This file lists each amplicon and the number of mean reads per cell. For each amplicon, it also lists whether the amplicon passed the completeness cell finder threshold. The threshold calculation is 0.2x the mean of the total mean reads (mean of column B * 0.2). If the mean reads for an amplicon is above this number, it passed the threshold and has a value of TRUE. If it is below this number, the value is FALSE.
-
cell.barcode.distribution.tsv.zip/barcode.cell.distribution.merged.tsv (v2) (DNA, DNA + Protein)
A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this article.
-
cells.bam (DNA, DNA + Protein)
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV. This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the barcode.
- cells.bam.csi/cells.bam.bai (v2) (DNA, DNA + Protein)
This is the .bam index file used by the IGV tool to view the alignments. - cells.loom
This file is an input file for Tapestri Insights v2. LOOM is an efficient file format for very large omics datasets, consisting of a main matrix, optional layers, and a variable number of row and column annotations. For more information, see this file. -
cells.vcf.gz (DNA, DNA + Protein)
This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells.
-
dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
This is a multi-omics file format that contains the data for all the analytes in a single file. For more details, read this article. These files use the HDF5 format.
-
fastp.html (DNA, DNA + Protein)
This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality – Q20 and Q30 base count and base content of the reads. This report can often be used to identify levels of primer dimers in a run.
-
fastp.json (DNA, DNA + Protein)
This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads.
-
flagstat.txt - only v2
This text file shows the mapping statistics from the reference genome alignment step, listing the number of QC-passed reads and QC-failed reads for items like read1, read2, properly paired, and mapped. For more information on the flagstat format, see this description. In v3 Pipeline the metrics are added to the metadata in the h5 file.
-
DNA qc.json (DNA, DNA + Protein) - v2 and v3
This file provides the results of the QC step in the DNA Pipeline run, including errors, warnings, and passing results.
-
mapped.bam (DNA, DNA + Protein)
This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but before cell calling. It maps all the barcodes in the run, and will contain on and off target reads.
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV.
-
metrics.json (DNA and DNA+Protein v3.4)
This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics.
-
missionbio_gatk.err (DNA, DNA + Protein) - Only v2
This file provides the detailed log information for the GATK step of the analysis pipeline.
-
missionbio_gatk.out (DNA, DNA + Protein) - Only v2
This file provides the log information for the GATK step of the analysis pipeline.
-
missionbio_part1_tube1.err (DNA, DNA + Protein) - Only v2
This file provides the detailed log information for part one of the analysis pipeline, such as barcode extraction and cell finding.
-
missionbio_part1_tube1.out (DNA, DNA + Protein) - Only v2
This file provides the log information for part one of the analysis pipeline, such as barcode extraction and cell finding.
-
missionbio_part2.err (DNA, DNA + Protein) - Only v2
This file provides the detailed log information for part two of the analysis pipeline, such as merging the genomic-VCFs (gVCFs) for all cells, ADO calculation, and .loom/.h5 file creation.
-
missionbio_part2.out (DNA, DNA + Protein) - Only v2
This file provides the log information for part two of the analysis pipeline, such as the ADO calculation.
-
missionbio_qc_lane1_tube1.err (DNA, DNA + Protein) - v2 and v3
This file provides all the detailed information for the FASTQ quality analysis step of the pipeline run.
-
missionbio_qc_lane1_tube1.out (DNA, DNA + Protein) - v2 and v3
This file provides all the information for the FASTQ quality analysis step of the pipeline run.
-
part1.tube1.progress.csv (DNA, DNA + Protein) - Only v2
This file shows the progress through part one of the run and lists all the steps that were successfully executed, such as barcode extraction.
-
part2.progress.csv (DNA, DNA + Protein) - Only v2
This file shows the progress through part two of the run and lists all the steps that were successfully executed, such as .h5 file generation.
-
protein.log (DNA + Protein) - v2 and v3
This file provides all the information for the QC step in the Protein Pipeline run to help debug failures.
-
Protein qc.json (DNA + Protein) - v2 and v3
This file contains the results of the QC step in the Protein Pipeline run, including errors, warnings, and passing results. -
report.html (DNA, DNA + Protein)
This new .html report is generated for each run. It summarizes the run details in the form of various metrics and plots that can be used to understand the performance. For more details, see this section of the Pipeline User Guide.
-
report.json (DNA + Protein v2 and v3)
This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report contents. -
tapestri_run_output.txt - v3 and v3.4
This file provides the detailed log information for all the steps of the analysis pipeline. It should be used as a starting point to investigate any run failure. -
qc.json- Only v3.4
This file contains the results of the QC step from both DNA and Protein Pipeline run, including errors, warnings, and passing results.
Demultiplexed DNA and DNA+Protein
This produces the same files as the regular DNA and DNA+Protein, in addition it generates the following files for each demultiplexed sample:
-
<sample>.h5
This is a multi-omics file format that contains the data for a single sample. It contains the same analytes as the parent run. DNA demultiplexed samples will contain only DNA data and DNA+Protein sample will contain both. -
<sample>.report.html
HTML report for individual samples where some metrics and plots are recalculated for each sample. -
<sample>metrics.json
Json file with the HTML report file metrics. -
<sample>.bam
Filtered cells.bam file with the reads for the cells from the sample. -
<sample>.bam.csi
Index file for the bam file.
Merge Bulk Runs
The Merge Bulk Runs Pipeline generates 2 outputs:
-
germline_truth.csv
This is a CSV containing a list of differentiating germline variants between the samples along with the genotype information. This file is the Sample Variants input to the Genotype Demultiplexing runs. -
candidate_variants.html
This is an HTML report for the pipeline and displays the information on the germline variants from each sample. It can be used to identify if the samples are genotypically different and if they should be multiplexed together in a single Tapestri run.
Merge Samples
The Merge Samples Pipeline generates a single merged .h5 file that contains different assays, depending on the analytes present in the input runs.
- Merge 2 or more DNA-only runs – Merged .h5 file with only the DNA-related assays – dna_variants and dna_read_counts.
- Merge 2 or more DNA + Protein runs – Merged .h5 file with DNA and protein assays – dna_variants, dna_read_counts, and protein_read_counts.
- Merge DNA and DNA + Protein runs – Merged .h5 file with only the DNA-related assays – dna_variants and dna_read_counts.