Tapestri barcodes are 1536 x 1536 9 bp barcodes that are attached to beads. These barcodes are extracted from the mapped reads. Tapestri DNA Pipeline first identifies barcodes that are likely to be uncorrupted and then error corrects the remaining barcodes to increase yields. These barcodes need to be corrected due to the sequencing and other errors. First, a whitelist-based approach is used to select all the barcodes that are exact matches.
From the discarded barcodes, either Hamming distance or Levenshtein distance is used to correct them. A maximum Hamming distance of 2 is used to correct the barcodes that are 9 bp long. For all other barcodes that are partial matches, a Levenshtein distance is dynamically set in a way that only barcodes with a single insertion/deletion are corrected. Additionally, a barcode is corrected if it matches only a single valid barcode with minimum edits (substitution, insertion, or deletions). If there are multiple candidates, we discard the barcode.