-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi!
I've been playing with the powerful splitcode for weeks but I have recently struggled with an apparent problem of the hashing map for a FASTQ file of ~300M reads. To check I was doing everything correctly, I run the same command and config file in a different FASTQ file of ~25M - it worked fine.
The full log with the common error on the big FASTQ file is the following:
* Using a list of 3073 tags (vector size: 3073; map size: 58,345; num elements in map: 115,177)
* will process sample 1: my_input.fastq
65M reads processed (79.1% assigned)terminate called after throwing an instance of 'std::overflow_error'
what(): robin_hood::map overflow
Aborted (core dumped)
My config file includes a barcode, a linker and another barcode with an allowed hamming distance of 1, 2 and 1.
My command is the following: splitcode -c config.txt --nFastqs=1 --assign --outb=02_splitcode/final_barcodes.fastq --mapping=02_splitcode/mapping.txt -o 02_splitcode/output_R1.fastq -t 64 my_input.fastq.
I also tried to reduce the Hamming distance in order to make the map size lower, but got the same error. It normally happens in the moment when 25-50% of the reads are processed, independently of parallelizing with -t or not. Finally, I cloned the repo and changed the line 1210 of the file splitcode/src/ProcessReads.cpp in order to write bufadd += l[i] + nl[i] + (comments && seq[i]->comment.l != 0 ? 1 : 0); // includes name, qual, and the extra space instead of bufadd += l[i] + nl[i]; // includes name and qual. It was suggested by an LLM, but I got the same error.
In case it helps, the RAM capacity in the computer I'm working on is of 252G, although splitcode seems to use it minimally.
Thanks in advance for your time!!!
Pedro