What should the reference FASTA file format be?

  • Updated

The reference sequence file (.fa file only) should be a FASTA format file. The maximum number of contigs a genome can have is 100 with a maximum of 4 billion bases.

FASTA format

A sequence record in FASTA format should contain 2 lines of text:


The header line should start with a “greater-than” symbol (>) followed by the config name. Allowed characters are “A” to “Z”, “a” to “z”, “0” to “9”, “_”, “-”, “.”, “,”, “;”, and “|” with SPACES between them. Since the header is used to identify the sequence, it must be unique for each sequence in the reference.

If the header contains spaces, then the characters before the space will be used as the contig name. For example, if the header is “>K03455.1 HIV sequence”, then the sequence name in the Designer reference will be “K03455.1“.


The only characters accepted for representing a sequence are “A”, “C”, “G”, “T”, and “N” (lower case versions are also allowed for representing low complexity regions). We support single and multiple line sequences. If described by multiple lines, each line must be the same size, except for the last line which can be shorter or longer than the previous lines. It is customary to use separate lines of 60 or 70 characters in length for readability reasons. In case a large single sequence line is used, the maximum size should not exceed 65,535 bases. 

Sequence Size

The minimum length of a sequence is 300 bp. The sequence for a record cannot be empty.

Ensure that the sequence of each record is unique and does not overlap with sequences in any other record of the FASTA file. A redundant sequence will interfere with the primer specificity and may lead to missed regions in the panel. Overlapping sequences must be combined into a single FASTA record.

Share this article:

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request