Multiple Myeloma Output files

The following output files are created based on the Multiple Myeloma (MM) Pipeline.

MM DNA/DNA+Protein Pipeline Outputs

<prefix>.report.html
This .html report is generated for each run. It summarizes the run details in the form of various metrics and plots that can be used to understand the performance. For more details, see this section of the Pipeline User Guide.
<prefix>.metrics.json
This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics.
<prefix>.mapped.bam
This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but bef∂Rore cell calling. It maps all the barcodes in the run, and will contain on and off target reads.
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV.

<prefix>.cells.bam
A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV. This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the barcode.

<prefix>.cells.bam.csi
This is the .bam index file used by the IGV tool to view the alignments.

<prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
This is a multi-omics file format that contains the data for all the analytes in a single file and should be used for the GI Reprocess Pipeline. For more details, read this article. These files use the HDF5 format.

<prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
This file reports the number of forward reads assigned to each amplicon for each barcode found in the run.
<prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this article.

tapestri_run_output.txt
This file provides the detailed log information for all the steps of the analysis pipeline. It should be used as a starting point to investigate any run failure.

<prefix>.qc.json
This file provides the results of the QC step for both the DNA and Protein FASTQ files, including errors, warnings, and passing results.
<prefix>-<analyte>-fastp.html
This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality – Q20 and Q30 base count and base content of the reads. This report can often be used to identify levels of primer dimers in a run.

<prefix>-<analyte>-fastp.json
This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads.

<prefix>.cells.vcf.gz
This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells.

<prefix>.allele.drop.out.report.txt (Single Sample Runs)
This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells. Refer to this article to learn more about ADO calculation.

tertiary/reports/{patient_name}.html
MM HTML report file with details on somatic variants, clonal architecture, CNV changes, VDJ clonotypes, protein differential expression and other details. For multiplexed runs, one file is produced for each sample.

tertiary/h5/{patient_name}.h5
This is the final per-sample h5 file and contains filtered variants, clone assignments, normalized CNV and protein data in addition to the contents of a regular h5 file as described in this article.

vdj/<prefix>_<bcr_type>-<gene>_summary.tsv
This is a file that reports the number of reads assigned to each gene for each cell found in a run. This file summarizes the VDJ reads per barcode from report.tsv file.

vdj/<prefix>_<bcr_type>-<gene>_summary_filtered.tsv
This is a file that reports the number of reads assigned to each gene for each cell found in a run after applying a cut-off of 10 VDJ reads. This file summarizes the VDJ reads per barcode from the report_filtered.tsv.

vdj/<prefix>_report.tsv
This file provides the per-cell VDJ combination and the CDR3 information. It contains columns for read count, read frequency, CDR3 nucleotide and CDR3 amino acid sequence, the chain information for the V, D, J and C gene, and the bcr type.

vdj/<prefix>_report_filtered.tsv
This file contains the same information as the report.tsv file but only for the cells with more than 10 VDJ reads.

vdj/<prefix>_metrics.json
This file provides the read statistics for the full run as well as for the individual V, D and J genes. It summarizes all the output TSV files and can be used to review the sequencing depth, reads per gene, read per clonotype, etc..

vdj/logs/progress.log
This is a log file which lists all the steps executed as part of the VDJ pipeline.

1 file for each multiplexed sample as well as one set for the spike-in CNV sample.

<sample>.h5
This is a multi-omics file format that contains the data for a single sample and should be used for GI Reprocess runs. It contains the same analytes as the parent run. DNA demultiplexed samples will contain only DNA data and DNA+Protein sample will contain both.
<sample>.report.html
HTML report for individual samples where some metrics and plots are recalculated for each sample.
<sample>.metrics.json
JSON file with the HTML report file metrics.
<sample>.bam
Filtered cells.bam file with the reads for the cells from the sample.
<sample>.bam.csi
Index file for the bam file.

<prefix>.html
Time course analysis HTML report file with details on somatic variants, clonal architecture, protein differential expression, and other details stratified by time point. Additionally displays changes in mutational profiles and protein expression across time points.

<prefix>.h5
Merged h5 file with all input samples together in a single file.

Share this article:

0 out of 0 found this helpful

Have more questions? Submit a request