Skip to content

Difference between falco and fastqc in reported polyA/polyG content #69

@wm75

Description

@wm75

The output of falco (1.2.3) and fastqc (0.12.1) is markedly different for polyA stats (and I assume would also be for polyG, but haven't tested).
I haven't looked into the actual code producing them, but compared to fastqc the values from falco seem to increase much faster, here's an example of the corresponding section of the reports:

First fastqc:

#Position	Illumina Universal Adapter	Illumina Small RNA 3' Adapter	Illumina Small RNA 5' Adapter	Nextera Transposase Sequence	PolyA	PolyG
1	0.0	0.0	0.0	0.0	0.020387359836901122	0.0
2	0.0	0.0	0.0	0.0	0.06116207951070336	0.0
3	0.0	0.0	0.0	0.0	0.06116207951070336	0.0
4	0.0	0.0	0.0	0.0	0.06116207951070336	0.0
5	0.0	0.0	0.0	0.0	0.12232415902140673	0.0
6	0.0	0.0	0.0	0.0	0.12232415902140673	0.0
7	0.0	0.0	0.0	0.0	0.12232415902140673	0.0
8	0.0	0.0	0.0	0.0	0.14271151885830785	0.0
9	0.0	0.0	0.0	0.0	0.14271151885830785	0.0
10-11	0.0	0.0	0.0	0.0	0.14271151885830785	0.0
12-13	0.0	0.0	0.0	0.0	0.1529051987767584	0.0
14-15	0.0	0.0	0.0	0.0	0.17329255861365955	0.0
16-17	0.0	0.0	0.0	0.0	0.21406727828746175	0.0
18-19	0.0	0.0	0.0	0.0	0.22426095820591233	0.0
20-21	0.12232415902140673	0.0	0.0	0.0	0.24464831804281345	0.0
22-23	0.12232415902140673	0.0	0.0	0.0	0.24464831804281345	0.0
24-25	0.12232415902140673	0.0	0.0	0.0	0.24464831804281345	0.0
26-27	0.1325178389398573	0.0	0.0	0.0	0.24464831804281345	0.0
28-29	0.2038735983690112	0.0	0.0	0.0	0.24464831804281345	0.0
30-31	0.22426095820591233	0.0	0.0	0.0	0.24464831804281345	0.0
32-33	0.22426095820591233	0.0	0.0	0.0	0.2650356778797146	0.0
34-35	0.24464831804281345	0.0	0.0	0.0	0.29561671763506625	0.0
36-37	0.3058103975535168	0.0	0.0	0.0	0.31600407747196735	0.0
38-39	0.4383282364933741	0.0	0.0	0.0	0.32619775739041795	0.0
40-41	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
42-43	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
44-45	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
46-47	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
48-49	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
50-51	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
52-53	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
54-55	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
56-57	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
58-59	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
60-61	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
62-63	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
64-65	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
66-67	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
68-69	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
70-71	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
72-73	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
74-75	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
76-77	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
78-79	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
80-81	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
82-83	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
84-85	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
86-87	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
88-89	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
90-91	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
92-93	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
94-95	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0
96-97	0.4689092762487258	0.0	0.0	0.0	0.32619775739041795	0.0

next falco:

#Position	Illumina Universal Adapter	Illumina Small RNA 3' Adapter	Illumina Small RNA 5' Adapter	Nextera Transposase Sequence	PolyA	PolyG
1	0	0	0	0	0.0203874	0
2	0	0	0	0	0.0815494	0
3	0	0	0	0	0.142712	0
4	0	0	0	0	0.183486	0
5	0	0	0	0	0.285423	0
6	0	0	0	0	0.38736	0
7	0	0	0	0	0.489297	0
8	0	0	0	0	0.591233	0
9	0	0	0	0	0.672783	0
10	0	0	0	0	0.754332	0
11	0	0	0	0	0.835882	0
12	0	0	0	0	0.917431	0
13	0	0	0	0	1.01937	0
14	0	0	0	0	1.1213	0
15	0	0	0	0	1.24363	0
16	0	0	0	0	1.34557	0
17	0	0	0	0	1.46789	0
18	0	0	0	0	1.59021	0
19	0	0	0	0	1.67176	0
20	0.122324	0	0	0	1.75331	0
21	0.122324	0	0	0	1.83486	0
22	0.122324	0	0	0	1.89602	0
23	0.122324	0	0	0	1.95719	0
24	0.122324	0	0	0	2.01835	0
25	0.122324	0	0	0	2.07951	0
26	0.122324	0	0	0	2.14067	0
27	0.142712	0	0	0	2.20183	0
28	0.183486	0	0	0	2.263	0
29	0.224261	0	0	0	2.32416	0
30	0.224261	0	0	0	2.38532	0
31	0.224261	0	0	0	2.44648	0
32	0.224261	0	0	0	2.50765	0
33	0.224261	0	0	0	2.60958	0
34	0.224261	0	0	0	2.71152	0
35	0.265036	0	0	0	2.81346	0
36	0.285423	0	0	0	2.89501	0
37	0.326198	0	0	0	2.99694	0
38	0.407747	0	0	0	3.09888	0
39	0.468909	0	0	0	3.20082	0
40	0.468909	0	0	0	3.30275	0
41	0.468909	0	0	0	3.40469	0
42	0.468909	0	0	0	3.48624	0
43	0.468909	0	0	0	3.56779	0
44	0.468909	0	0	0	3.60856	0
45	0.468909	0	0	0	3.62895	0
46	0.468909	0	0	0	3.64934	0
47	0.468909	0	0	0	3.66972	0
48	0.468909	0	0	0	3.69011	0
49	0.468909	0	0	0	3.7105	0
50	0.468909	0	0	0	3.7105	0
51	0.468909	0	0	0	3.7105	0
52	0.468909	0	0	0	3.7105	0
53	0.468909	0	0	0	3.7105	0
54	0.468909	0	0	0	3.7105	0
55	0.468909	0	0	0	3.7105	0
56	0.468909	0	0	0	3.7105	0
57	0.468909	0	0	0	3.7105	0
58	0.468909	0	0	0	3.7105	0
59	0.468909	0	0	0	3.73089	0
60	0.468909	0	0	0	3.75127	0
61	0.468909	0	0	0	3.77166	0
62	0.468909	0	0	0	3.81244	0
63	0.468909	0	0	0	3.85321	0
64	0.468909	0	0	0	3.89399	0
65	0.468909	0	0	0	3.93476	0
66	0.468909	0	0	0	3.97554	0
67	0.468909	0	0	0	4.01631	0
68	0.468909	0	0	0	4.05708	0
69	0.468909	0	0	0	4.09786	0
70	0.468909	0	0	0	4.13863	0
71	0.468909	0	0	0	4.17941	0
72	0.468909	0	0	0	4.22018	0
73	0.468909	0	0	0	4.26096	0
74	0.489297	0	0	0	4.32212	0
75	0.489297	0	0	0	4.38328	0
76	0.489297	0	0	0	4.42406	0
77	0.489297	0	0	0	4.46483	0
78	0.489297	0	0	0	4.50561	0
79	0.489297	0	0	0	4.54638	0
80	0.489297	0	0	0	4.58716	0
81	0.489297	0	0	0	4.62793	0
82	0.489297	0	0	0	4.66871	0
83	0.509684	0	0	0	4.70948	0
84	0.509684	0	0	0	4.75025	0
85	0.509684	0	0	0	4.79103	0
86	0.509684	0	0	0	4.8318	0
87	0.509684	0	0	0	4.91335	0
88	0.509684	0	0	0	4.9949	0
89	0.509684	0	0	0	5.05607	0
90	0.509684	0	0	0	5.09684	0
91	0.509684	0	0	0	5.158	0
92	0.570846	0	0	0	5.21916	0
93	0.632008	0	0	0	5.28033	0
94	0.632008	0	0	0	5.34149	0
95	0.632008	0	0	0	5.40265	0
96	0.632008	0	0	0	5.46381	0
97	0.632008	0	0	0	5.52497	0
98	0.632008	0	0	0	5.52497	0
99	0.632008	0	0	0	5.52497	0
100	0.632008	0	0	0	5.52497	0
101	0.632008	0	0	0	5.52497	0
102	0.632008	0	0	0	5.52497	0
103	0.632008	0	0	0	5.52497	0
104	0.632008	0	0	0	5.52497	0
105	0.632008	0	0	0	5.52497	0
106	0.632008	0	0	0	5.52497	0
107	0.632008	0	0	0	5.52497	0
108	0.632008	0	0	0	5.52497	0

My first guess would be that falco counts longer polyA runs multiple times?

The input data was https://raw.githubusercontent.com/galaxyproject/tools-iuc/6b50408a1ff7902575be37b2fa21aa80fe684e5c/tools/falco/test-data/1000trimmed.fastq and both tools where run with just default settings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions