Multiple Myeloma Output files

  • Updated

The following output files are created based on the Multiple Myeloma (MM) Pipeline.

MM DNA/DNA+Protein Pipeline Outputs

Secondary Pipeline Outputs

  • <prefix>.report.html
    This .html report is generated for each run. It summarizes the run details in the form of various metrics and plots that can be used to understand the performance. For more details, see this section of the Pipeline User Guide.

  • <prefix>.metrics.json
    This file contains the .HTML run report content in the .json format. It is a machine-readable format and provides an easy way to do additional analysis on the run report metrics.

  • <prefix>.mapped.bam
    This file is generated after mapping to the reference genome and selecting high-quality primary alignments with a mapping quality score of > 30. It is generated after barcode correction and alignment but bef∂Rore cell calling. It maps all the barcodes in the run, and will contain on and off target reads.
    A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV.
  • <prefix>.cells.bam
    A .bam file is a binary version of a sequence alignment map (SAM). For more information on these formats, refer to this PDF. To visualize these files, use tools like IGV. This file lists a read group (RG) tag for each read, which can be used to find the number of reads for the barcode. 
  • <prefix>.cells.bam.csi
    This is the .bam index file used by the IGV tool to view the alignments.
  • <prefix>.dna.h5 (DNA only) / dna+protein.h5 (DNA + Protein only)
    This is a multi-omics file format that contains the data for all the analytes in a single file and should be used for the GI Reprocess Pipeline. For more details, read this article. These files use the HDF5 format. 
  • <prefix>.all.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    This file reports the number of forward reads assigned to each amplicon for each barcode found in the run. GE output file SAMPLE_NAME_all_barcode_distribution.tsv is similar to the v2 pipeline's all barcode distribution file.

  • <prefix>.cell.barcode.distribution.tsv.zip (DNA, DNA + Protein)
    A file that reports the number of forward reads assigned to each amplicon for each cell found in a run. For more information, see this article.
  • tapestri_run_output.txt
    This file provides the detailed log information for all the steps of the analysis pipeline. It should be used as a starting point to investigate any run failure.
  • <prefix>.qc.json
    This file provides the results of the QC step for both the DNA and Protein FASTQ files, including errors, warnings, and passing results.
  • <prefix>-<analyte>-fastp.html
    This .html report file is generated by the FASTP tool that is used to check the input sequence quality. It reports various metrics and graphs, such as base quality – Q20 and Q30 base count and base content of the reads. This report can often be used to identify levels of primer dimers in a run. 
  • <prefix>-<analyte>-fastp.json
    This file provides the FASTQ sequence quality information generated by the FASTP tool in the analysis pipeline, such as Q20 and Q30 base count and base content of the reads.
  • <prefix>.cells.vcf.gz
    This compressed annotated .vcf file conforms to the standard GATK format. It contains all the variants for all the barcodes called as cells. 
  • <prefix>.allele.drop.out.report.txt (Single Sample Runs)
    This text file lists the variants used to calculate the allele dropout (ADO) rate. It shows the summary for the ADO calculation, such as the sample name, median ADO for the sample, number of variants, and total number of cells. Refer to this article to learn more about ADO calculation.

 

MM Specific Outputs

  • tertiary/reports/{patient_name}.html
    MM HTML report file with details on somatic variants, clonal architecture, CNV changes, VDJ clonotypes, protein differential expression and other details. For multiplexed runs, one file is produced for each sample. 
  • tertiary/h5/{patient_name}.h5
    This is the final per-sample h5 file and contains filtered variants, clone assignments, normalized CNV and protein data in addition to the contents of a regular h5 file as described in this article
  • vdj/<prefix>_<bcr_type>-<gene>_summary.tsv 
    This is a file that reports the number of reads assigned to each gene for each cell found in a run. This file summarizes the VDJ reads per barcode from report.tsv file.
  • vdj/<prefix>_<bcr_type>-<gene>_summary_filtered.tsv
    This is a file that reports the number of reads assigned to each gene for each cell found in a run after applying a cut-off of 10 VDJ reads. This file summarizes the VDJ reads per barcode  from the report_filtered.tsv.  
  • vdj/<prefix>_report.tsv 
    This file provides the per-cell VDJ combination and the CDR3 information. It contains columns for read count, read frequency,  CDR3 nucleotide and CDR3 amino acid sequence, the chain information for the V, D, J and C gene, and the bcr type.
  • vdj/<prefix>_report_filtered.tsv
    This file contains the same information as the report.tsv file but only for the cells with more than 10 VDJ reads.
  • vdj/<prefix>_metrics.json 
    This file provides the read statistics for the full run as well as for the individual V, D and J genes. It summarizes all the output TSV files and can be used to review the sequencing depth, reads per gene, read per clonotype, etc.. 
  • vdj/logs/progress.log 
    This is a log file which lists all the steps executed as part of the VDJ pipeline.

Demultiplexed DNA and DNA+Protein

1 file for each multiplexed sample as well as one set for the spike-in CNV sample.

  • <sample>.h5
    This is a multi-omics file format that contains the data for a single sample and should be used for GI Reprocess runs. It contains the same analytes as the parent run. DNA demultiplexed samples will contain only DNA data and DNA+Protein sample will contain both.

  • <sample>.report.html
    HTML report for individual samples where some metrics and plots are recalculated for each sample.

  • <sample>.metrics.json
    JSON file with the HTML report file metrics.

  • <sample>.bam
    Filtered cells.bam file with the reads for the cells from the sample.

  • <sample>.bam.csi
    Index file for the bam file.

 

Time course analysis reports

  • <prefix>.html
    Time course analysis HTML report file with details on somatic variants, clonal architecture, protein differential expression, and other details stratified by time point. Additionally displays changes in mutational profiles and protein expression across time points.
  • <prefix>.h5
    Merged h5 file with all input samples together in a single file.
Share this article:

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request