Sample Variants CSV File for Genotype Demultiplexing
Sample variants file is an input file for the genotype based demultiplexing pipeline. It should be a comma separated file with one variant listed per line. The header of the file should contain the following columns:
Column name | Description | Example |
chromosome | The chromosome the variant is on. Must include the “chr” prefix. | chr1 |
position | The position of the variant in 1-based coordinates. | 6123151 |
ref_allele | The reference allele of the variant. Special notes:
|
A |
alt_allele | The alternate allele of the variant. Special notes:
|
T |
genotype | The expected genotype of the variant. Acceptable values:
|
1 |
type | The type of variant. Acceptable values (see Table 2):
|
germline |
sample_id | The name of your sample.
|
SampleA |
* All columns are required.
A user can create this file from existing germline variant information or it can be generated using the Merge Bulk Runs pipeline. This pipeline merges the H5 files from the bulk NGS data and generates the germline_truth.csv file with a list of differentiating variants between the samples.
Additional contingencies for the success of the Genotype-based demultiplexing:
- Minimum 5 variants must overlap between variant file and the panel, as well as between samples in order for demultiplex to work.
- Samples from close relatives can present a challenge. Thus, a minimum of 5 differentiating variants between the samples must be provided. Differentiating variants can be defined as variant that has different median genotype one sample from the other. For example, if a variant is HOM in one then it should be HET/WT in other OR if it is HET in one then it should be WT/HOM in other OR if it is WT in one then it should be HET/HOM in other. In general, customers should avoid multiplexing related samples together and identify samples from relatives (and their relationship) prior to data processing.
- Samples with donor background may also present a challenge. Customers should identify samples that have received bone marrow transplant and the relationship of the donor prior to data processing and more time for data processing may be needed. Ideally, germline mutation for donor, if available, should be submitted for analysis
Example:
chromosome | position | ref_allele | alt_allele | genotype | type | sample_id |
chr1 | 115256669 | G | A | 1 | germline | SampleA |
chr1 | 115256669 | G | A | 0 | germline | SampleB |
chr1 | 115256669 | G | A | 2 | germline | SampleC |
chr4 | 55599436 | T | C | 0 | germline | SampleA |
chr4 | 55599436 | T | C | 2 | germline | SampleB |
chr4 | 55599436 | T | C | 1 | germline | SampleC |
To download an example sample variants file click here.
Sample Variants CSV File for MRD
The same file can also be used in the scMRD pipeline for demultiplexing and to define the whitelist and blacklist variants. More details on each variant type are given below:
Variant types
type | Use Case | Required or Optional | What happens if it is missing? |
germline | For demultiplexing. | Required (if multiplexed) | Demux cannot be done. |
germline | Remove germline SNPs from somatic variant calls. | Optional | scMRD pipeline will still work, but in rare cases some germline variants may falsely appear as somatic variants. |
whitelist | Call “known” mutation, or previously detected mutations | Optional | scMRD pipeline will still work, but in rare cases some somatic variants may be filtered out based on their annotation. |
blacklist | Remove variants that are biologically uninteresting or are false positive variants. | Optional |
Report will contain irrelevant or artifact mutations
|
The scMRD pipeline v1.0.2 supports this CSV file format in addition to the VCF format.
Example file. The bottom two rows show an insertion and deletion, respectively.
chromosome | position | ref_allele | alt_allele | genotype | type | sample_id |
chr1 | 115256669 | G | A | 1 | germline | SampleA |
chr2 | 25458546 | C | T | 2 | blacklist | SampleA |
chr1 | 115256669 | G | A | 0 | germline | SampleB |
chr2 | 25469913 | C | T | 2 | whitelist | SampleB |
chr1 | 115256669 | G | A | 2 | germline | SampleC |
chrX | 6125123 | A | ATACT | 1 | whitelist | SampleC |
chr5 | 51241235 | GTAT | G | 1 | whitelist | SampleC |
- Raji-KG1.csv2 KB