The reference sequence file (.fa file only) should be a FASTA format file. The maximum number of contigs a genome can have is 100 with a maximum of 4 billion bases.
FASTA format
A sequence record in FASTA format should contain 2 lines of text:
Header
The header line should start with a “greater-than” symbol (>) followed by the config name. Allowed characters are “A” to “Z”, “a” to “z”, “0” to “9”, “_”, “-”, and “.” with SPACES between them. Since the header is used to identify the sequence, it must be unique for each sequence in the reference.
If the header contains spaces, then the characters before the space will be used as the contig name. For example, if the header is “>K03455.1 HIV sequence”, then the sequence name in the Designer reference will be “K03455.1“.
Sequence
The only characters accepted for representing a sequence are “A”, “C”, “G”, “T”, and “N” (lower case versions are also allowed for representing low complexity regions). We support single and multiple line sequences. If described by multiple lines, each line must be the same size, except for the last line which can be shorter or longer than the previous lines. It is customary to use separate lines of 60 or 70 characters in length for readability reasons. In case a large single sequence line is used, the maximum size should not exceed 65,535 bases.
Sequence Size
The minimum length of a sequence is 300 bp. The sequence for a record cannot be empty.
Ensure that the sequence of each record is unique and does not overlap with sequences in any other record of the FASTA file. A redundant sequence will interfere with the primer specificity and may lead to missed regions in the panel. Overlapping sequences must be combined into a single FASTA record.
Reference Genomes
The following pipeline reference genomes can be used as a starting point for the hg19, hg38, and mm10 genomes for the purpose of adding in transgenes, vector backbones, etc.