The following output files are created based on the Genome Integrity (GI) Pipeline.
DNA/DNA+Protein Pipeline Outputs
-
<prefix>.report.html
This .html report is generated for each run. It summarizes the run details in the form of various metrics and plots that can be used to understand the performance. For more details, see this section of the Pipeline User Guide.
-
<prefix>.metrics.json
This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics.
-
<prefix>.mapped.bam
This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but bef∂Rore cell calling. It maps all the barcodes in the run, and will contain on and off target reads.
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV.
-
<prefix>.cells.bam
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV. This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the barcode.
-
<prefix>.cells.bam.csi
This is the .bam index file used by the IGV tool to view the alignments.
-
<prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
This is a multi-omics file format that contains the data for all the analytes in a single file and should be used for the GI Reprocess Pipeline. For more details, read this article. These files use the HDF5 format.
-
<prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
This file reports the number of forward reads assigned to each amplicon for each barcode found in the run. GE output file SAMPLE_NAME_all_barcode_distribution.tsv is similar to the v2 pipeline's all barcode distribution file.
-
<prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this article.
-
tapestri_run_output.txt
This file provides the detailed log information for all the steps of the analysis pipeline. It should be used as a starting point to investigate any run failure.
-
<prefix>.qc.json
This file provides the results of the QC step for both the DNA and Protein FASTQ files, including errors, warnings, and passing results.
-
<prefix>-<analyte>-fastp.html
This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality – Q20 and Q30 base count and base content of the reads. This report can often be used to identify levels of primer dimers in a run.
-
<prefix>-<analyte>-fastp.json
This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads.
-
<prefix>.cells.vcf.gz
This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells.
-
<prefix>.allele.drop.out.report.txt (Single Sample Runs)
This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells. Refer to this article to learn more about ADO calculation.
GI Specific Outputs
-
tertiary/reports/{patient_name}.html
GI HTML report file with details on clonal architecture, CNV events, somatic mutations, protein differential expression and other details. For multiplexed runs, one file is produced for each sample.
-
tertiary/h5/{patient_name}.h5
This is the final per-sample h5 file and contains filtered variants, clone assignments, normalized CNV and protein data in addition to the contents of a regular h5 file as described in this article.
Demultiplexed DNA and DNA+Protein
1 file for each multiplexed sample as well as one set for the spike-in CNV sample.
-
<sample>.h5
This is a multi-omics file format that contains the data for a single sample and should be used for GI Reprocess runs. It contains the same analytes as the parent run. DNA demultiplexed samples will contain only DNA data and DNA+Protein sample will contain both.
-
<sample>.report.html
HTML report for individual samples where some metrics and plots are recalculated for each sample.
-
<sample>.metrics.json
JSON file with the HTML report file metrics.
-
<sample>.bam
Filtered cells.bam file with the reads for the cells from the sample.
-
<sample>.bam.csi
Index file for the bam file.