The following output files are created based on the Clonal Insights Software (CIS).
Single Sample Pipeline
-
<prefix>.report.html
This .html report is generated for each run. It summarizes the run details in the form of various metrics and plots that can be used to understand the performance. For more details, see this section of the Pipeline User Guide. -
<prefix>.metrics.json
This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics. -
<prefix>.mapped.bam
This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but before cell calling. It maps all the barcodes in the run, and will contain on and off target reads.
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV -
<prefix>.cells.bam
This file is generated after cell calling, and contains only mapped reads that have been assigned to cell barcodes. This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the cell. -
<prefix>.cells.bam.csi
This is the .bam index file used by the IGV tool to view the alignments. -
<prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
This is a multi-omics file format that contains the data for all the analytes in a single file and should be used for the CIS Reprocess Pipeline. For more details, read this article. These files use the HDF5 format. -
<prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
This file reports the number of forward reads assigned to each amplicon for each barcode found in the run. GE output file SAMPLE_NAME_all_barcode_distribution.tsv is similar to the v2 pipeline's all barcode distribution file. - <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this article.
-
tapestri_run_output.txt
This file provides the detailed log information for all the steps of the analysis pipeline. It should be used as a starting point to investigate any run failure. -
<prefix>.qc.json
This file provides the results of the QC step for both the DNA and Protein FASTQ files, including errors, warnings, and passing results. -
<prefix>-<analyte>-fastp.html
This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality (Q20 and Q30 base count and base content of the reads). This report can often be used to identify levels of primer dimers in a run. -
<prefix>-<analyte>-fastp.json
This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads. -
<prefix>.cells.vcf.gz
This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells. -
<prefix>..allele.drop.out.report.txt (Standard Runs)
This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells.
Refer to this article to learn more about ADO calculation.
Demultiplexed DNA and DNA+Protein
1 file for each multiplexed sample:
-
<sample>.h5
This is a multi-omics file format that contains the data for a single sample and should be used for CIS Time Course runs. It contains the same analytes as the parent run. DNA demultiplexed samples will contain only DNA data and DNA+Protein samples will contain both. -
<sample>.report.html
HTML report for individual samples where some metrics and plots are recalculated for each sample. -
<sample>metrics.json
Json file with the HTML report file metrics. -
<sample>.bam
Filtered cells.bam file containing the reads for only the cells assigned to each sample. -
<sample>.bam.csi
Index file for the bam file.
CSI Specific Outputs
-
tertiary/reports/<prefix>.{sample_name}.html
CIS HTML report file with details on somatic variants, clonal architecture, protein differential expression and other details. For multiplexed runs, one file is produced for each sample. -
tertiary/h5/<prefix>.{sample_name}.h5
This is the final per-sample h5 file and contains filtered variants, clone assignments, and protein data in addition to the contents of a regular h5 file.
Time course analysis reports
-
<prefix>.html
Time course analysis HTML report file with details on somatic variants, clonal architecture, protein differential expression, and other details stratified by time point. Additionally displays changes in mutational profiles and protein expression across time points. -
<prefix>.h5
Merged h5 file with all input samples together in a single file.