-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Description
Initially, I thought that this problem only appeared when using partial5 or partial3 specifications - see the previous issue #47. But this issue seems to affect splitcode even with simple distance error tolerance specifications not at the 5' or 3' ends of sequences.
Expected behavior:
The matching sequence is prioritized by length (with longest sequence getting the highest priority). splitcode docs FAQ
- This is not only the documented behavior, but the desired behavior in many applications.
Current behavior (splitcode version 0.31.4): a shorter match without errors is prioritized over a longer match within the error tolerance.
Example
config.tsv
ids tags distance
tagT TTT 0
tagT TTTT 0
tagT TTTTT 0
tagT TTTTTT 0
tagT TTTTTTT 0
tagT TTTTTTTT 0
tagT TTTTTTTTT 0
tagT TTTTTTTTTT 1
tagT TTTTTTTTTTT 1
tagT TTTTTTTTTTTT 1
tagT TTTTTTTTTTTTT 1
tagT TTTTTTTTTTTTTT 1
tagT TTTTTTTTTTTTTTT 1
input.fastq
@11_bp_no_mismatch
AAATTTTTTTTTTTAAA
+
;;;;;;;;;;;;;;;;;
@11_bp_1_internal_mismatch
AAATTTTTTTTTCTAAA
+
;;;;;;;;;;;;;;;;;
@11_bp_1_internal_mismatch2
AAATTTTTTTTCTTAAA
+
;;;;;;;;;;;;;;;;;
@11_bp_1_5prime_mismatch
AAACTTTTTTTTTTAAA
+
;;;;;;;;;;;;;;;;;
@11_bp_1_3prime_mismatch
AAATTTTTTTTTTCAAA
+
;;;;;;;;;;;;;;;;;
command
splitcode -c config.tsv --loc-names --out-fasta --nFastqs 1 --pipe input.fastq
manually annotated output
- carets
^mark the positions of splitcode's match - asterisks '*' mark the positions of the desired/expected match
- the positions of the desired/expected match is also included in the read name
>11_bp_no_mismatch LX:Z:tagT:0,2-14 Expected:same
AAATTTTTTTTTTTAAA
^^^^^^^^^^^^
************
>11_bp_1_internal_mismatch LX:Z:tagT:0,2-12 Expected:same
AAATTTTTTTTTCTAAA
^^^^^^^^^^
**********
>11_bp_1_internal_mismatch2 LX:Z:tagT:0,3-11 Expected:0,3-14
AAATTTTTTTTCTTAAA
^^^^^^^^
***********
>11_bp_1_5prime_mismatch LX:Z:tagT:0,3-14 Expected:same
AAACTTTTTTTTTTAAA
^^^^^^^^^^^
***********
>11_bp_1_3prime_mismatch LX:Z:tagT:0,2-13 Expected:same
AAATTTTTTTTTTCAAA
^^^^^^^^^^^
***********
This same example can be explored interactively in this Google Colab notebook: https://colab.research.google.com/drive/1uRaVpy57iKtk70xcb_8syfD0BrCmLfpX
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels