-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I understand splitcode has many usages, and some of them involves substituting one sequence with another, such as --assign or sub, will, in a manner of speaking, make the corresponding quality scores irrelevant, since those sequences do not come out of the sequencer.
However, I have a feeling that splitcode changes the quality code of all its output bases to K, including in places where there's no substitution. For example, in the following config file:
@extract <barcode{{a14}}>,<barcode{{bead1}}>,<barcode{{bead2}}>,<barcode{{bead3}}>
tags ids groups locations distances previous next minFindsG maxFindsG exclude
a14_plain.txt$ a14 a14 0:0:8 0 - {{bead1}} 1 1 0
struct/newBeads/bc1.txt$ bead1 bead1 0:8:18 0 {{a14}} {{bead2}} 1 1 0
struct/newBeads/bc2.txt$ bead2 bead2 0:18:28 0 {{bead1}} {{bead3}} 1 1 0
struct/newBeads/bc3.txt$ bead3 bead3 0:28:38 0 {{bead2}} - 1 1 0
There's no substitution involved, just extraction. Yet, splitcode will recode all phred scores in the output to K, i.e from
@A00563:449:HW2C7DMXY:1:1101:17345:1000:ATACATGA
AACAGACAGACGCGATATGAGAGTTCCTAATGTGAGCAATACATGA
+
??????????????????????????????????????????????
to
@A00563:449:HW2C7DMXY:1:1101:17345:1000:ATACATGA
AACAGACAGACGCGATATGAGAGTTCCTAATGTGAGCA
+
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
This makes it hard for us to calculate certain QC metrics such as Q30% in barcode. Is it possible to keep the original file's phred scores if substitution is not involved?