The post-processing steps after variant calling include variant decomposition, filtering, and output file generation.
Multiallelic variants are decomposed into biallelic variants and then normalized to ensure that each VCF entry is left-aligned and parsimonious (Tan, Abecasis, et al. 2015). Blacklisted loci (suspected false variants) are filtered out, and all loci that are more than the 1000 QUAL threshold are tagged for downstream processing. The positions that pass our filtering criteria are called variants.
Tapestri Pipeline generates an .h5 output file, which is a multi-omics file format stored in a structured HDF5 file. This file can contain data for one or more runs. Each run can contain data for one or more "assays," where each assay contains data for a different analyte. The three main analytes are - dna_read_counts, dna_variants, and protein_read_counts. For more details on H5 refer to this article.
The DNA variants assay is converted into an open-source .loom format (Zeisel, Hochgerner, et al. 2018), which is used as an input for our tertiary analysis tool called Tapestri Insights.