Skip to content

ReadTool allocates entire file contents for bounded operations (binary check, text read, base64) #13669

@SeanThomasWilliams

Description

@SeanThomasWilliams

ReadTool reads entire files into memory in three places even though only a fraction of the content is used. With concurrent agents this causes RSS bloat via mimalloc arena fragmentation.

1. file.text() loads the full file for a 50KB output cap (read.ts:147)

file.text().then(text => text.split("\n")) allocates the entire file as a string + line array. A 500MB file creates a 500MB allocation just to return 50KB. Should stream line-by-line instead.

2. isBinaryFile() reads the full file to check 4KB (read.ts:247-250)

file.arrayBuffer() loads the entire file, then only the first 4096 bytes are inspected. Should use file.slice(0, 4096).arrayBuffer().

3. No size guard on image/PDF base64 encoding (read.ts:117-138)

Buffer.from(await file.bytes()).toString("base64") has no cap. Large images/PDFs create huge byte arrays plus ~1.33x base64 strings. Should reject files above a reasonable threshold.

All three are in a single file and straightforward to fix.

Metadata

Metadata

Assignees

Labels

perfIndicates a performance issue or need for optimization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions