Skip to content

fix: support repeated --file flag for multi-file scoping#498

Merged
carlos-alm merged 10 commits intomainfrom
fix/483-repeated-file-flag
Mar 17, 2026
Merged

fix: support repeated --file flag for multi-file scoping#498
carlos-alm merged 10 commits intomainfrom
fix/483-repeated-file-flag

Conversation

@carlos-alm
Copy link
Contributor

Summary

Closes #483

  • --file now collects all values into an array when repeated (e.g. --file a.js --file b.js)
  • Added collectFile accumulator, normalizeFileFilter, and buildFileConditionSQL helpers in src/db/query-builder.js
  • Updated all 16 CLI commands that accept --file to use the accumulator
  • Updated 14 query-layer consumers to build AND (file LIKE ? OR file LIKE ?) clauses for multi-file filters
  • Backward compatible: single --file value still works as before

Test plan

  • codegraph complexity --file src/shared/paginate.js --file src/shared/errors.js returns results from both files
  • Single --file still works as before
  • Unit tests for normalizeFileFilter, buildFileConditionSQL, collectFile
  • NodeQuery.fileFilter() works with arrays

Make --file variadic via Commander accumulator so passing --file a.js
--file b.js collects both values into an array.

- Add normalizeFileFilter(), buildFileConditionSQL(), and collectFile()
  helpers to query-builder.js
- Update NodeQuery.fileFilter() to accept string | string[]
- Update all 16 CLI command definitions to use collectFile accumulator
- Update all downstream SQL consumers (complexity, cfg, dataflow,
  manifesto, ast, roles, generators, nodes) to use buildFileConditionSQL
- Update in-memory repository to handle array file filters
- Update search layer (keyword, prepare, filters) for array filePattern
- Add tests for normalizeFileFilter, buildFileConditionSQL, collectFile,
  and NodeQuery.fileFilter with arrays
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR adds repeatable --file flag support across all 16 CLI commands, backed by three new helpers in src/db/query-builder.js: collectFile (Commander accumulator), normalizeFileFilter (string/array normalisation), and buildFileConditionSQL (parameterised OR LIKE clauses with proper escapeLike + ESCAPE '\\'). All 14 query-layer consumers were updated to use these helpers or NodeQuery.fileFilter(), and the issues raised in the previous review round (column injection in buildFileConditionSQL, mixed glob/non-glob post-filtering in filters.js, missing escapeLike in keyword.js and prepare.js) have all been addressed.

One bug remains:

  • normalizeFileFilter does not strip empty strings from arrays (src/db/query-builder.js line 82–83). normalizeFileFilter(['', 'src/foo.js']) returns ['', 'src/foo.js'] unchanged. This propagates in two ways: (1) buildFileConditionSQL emits a LIKE '%%' ESCAPE '\\' clause that matches every row, silently discarding the file filter; (2) in src/features/owners.js the JavaScript-side f.includes('') is always true, so any empty-string entry in the array causes all other file patterns to be ignored. A one-line fix (arr.filter((s) => s.length > 0)) closes both paths.

The unit test suite is solid, though the multi-value buildFileConditionSQL test is missing the ESCAPE clause assertion that the single-value test already has.

Confidence Score: 3/5

  • Safe to merge after fixing the empty-string array bug in normalizeFileFilter; all other concerns are minor.
  • The feature is well-structured and the previous round's security and correctness issues were all addressed. However, the empty-string passthrough in normalizeFileFilter is a real correctness bug: it silently neutralises file filters in both the SQL path (LIKE '%%' matches everything) and the owners.js JavaScript path (String.includes('') is always true). The bug is reachable via CLI today with --file "" --file src/foo.js. This warrants a score below 4.
  • src/db/query-builder.js (normalizeFileFilter empty-string handling) and tests/unit/query-builder.test.js (missing ESCAPE assertion in multi-value test)

Important Files Changed

Filename Overview
src/db/query-builder.js Adds collectFile, normalizeFileFilter, and buildFileConditionSQL helpers. Core logic is sound with validateColumn and escapeLike, but normalizeFileFilter silently passes empty strings through arrays, causing downstream LIKE '%%' (match-all) conditions.
tests/unit/query-builder.test.js Good coverage of new helpers including edge cases, wildcard escaping, and array inputs. Missing ESCAPE clause assertion in the multi-value buildFileConditionSQL test and no test for normalizeFileFilter with arrays containing empty strings.
src/domain/search/search/filters.js applyFilters correctly evaluates each pattern individually (per-pattern glob vs substring check), resolving the previous mixed-pattern issue. Clean and well-structured.
src/domain/search/search/keyword.js Delegates non-glob file pre-filtering to buildFileConditionSQL for correct LIKE escaping. Glob patterns continue to rely on post-query applyFilters, which now handles per-pattern dispatch correctly.
src/domain/search/search/prepare.js Inline LIKE filter now correctly uses escapeLike and ESCAPE '\\' clauses for both single and multi-value cases. Functionally correct, though it duplicates logic already in buildFileConditionSQL.
src/db/repository/in-memory-repository.js buildFileFilterFn correctly uses normalizeFileFilter + escapeLike + likeToRegex to mirror SQL LIKE semantics for array file filters. Behaviour is consistent with the SQL path.
src/db/repository/nodes.js All query functions updated to use buildFileConditionSQL or NodeQuery.fileFilter(), providing consistent multi-file LIKE filtering across all node lookups.
src/features/owners.js Uses normalizeFileFilter for JavaScript-side array filtering via String.includes. Inherits the empty-string-in-array bug from normalizeFileFilter: f.includes('') is always true, so an empty string entry neutralises all other file filters.
src/shared/generators.js Updated iterRoles to use buildFileConditionSQL with correct leading-AND stripping. iterListFunctions passes opts.file directly through NodeQuery.fileFilter(). Clean.
src/features/shared/find-nodes.js Correctly uses buildFileConditionSQL with bare file column (no table alias, appropriate for single-table query). Multi-file OR filter appended safely.
src/features/complexity.js Both complexityData and iterComplexity updated to pass opts.file through buildFileConditionSQL('n.file'). Multi-file support is consistent and well-integrated.
src/domain/analysis/roles.js Correctly strips the leading AND from buildFileConditionSQL output before appending to the conditions array. Pattern is consistent with generators.js.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI["CLI --file a.js --file b.js\n(collectFile accumulator)"]
    CLI --> NFF["normalizeFileFilter(file)\nstring | string[] → string[]"]
    NFF -->|"empty / falsy"| EMPTY["return []  (no-op)"]
    NFF -->|"string"| WRAP["return [file]"]
    NFF -->|"array"| PASS["return file as-is\n⚠ empty strings not filtered"]

    WRAP --> BFC
    PASS --> BFC

    BFC["buildFileConditionSQL(files, column)\n+ validateColumn\n+ escapeLike"]
    BFC -->|"0 files"| NOSQL["{ sql: '', params: [] }"]
    BFC -->|"1 file"| SINGLE["AND col LIKE ? ESCAPE '\\'"]
    BFC -->|"N files"| MULTI["AND (col LIKE ? ESCAPE '\\'\nOR col LIKE ? ESCAPE '\\')"]

    SINGLE --> CONSUMERS
    MULTI --> CONSUMERS

    CONSUMERS["Query Consumers\n(complexity, roles, nodes,\nfind-nodes, generators…)"]
    CONSUMERS --> SQL["Parameterised SQL\nwith ESCAPE clause"]

    NFF2["normalizeFileFilter (owners.js)\n+ JS Array.filter"]
    NFF2 -->|"each entry"| INCL["f.includes(filter)\n⚠ ''.includes always true"]

    CLI --> NFF2

    GLOB["FTS / Embed search\n(keyword.js / prepare.js)"]
    CLI --> GLOB
    GLOB -->|"!isGlob"| BFC2["buildFileConditionSQL → SQL pre-filter"]
    GLOB -->|"isGlob"| POSTFILT["applyFilters (per-pattern:\nglobMatch or includes)"]
Loading

Comments Outside Diff (1)

  1. src/db/query-builder.js, line 81-84 (link)

    Empty strings in arrays bypass the filter entirely

    normalizeFileFilter passes empty strings through when they appear inside an array. normalizeFileFilter(['', 'src/foo.js']) returns ['', 'src/foo.js'] instead of ['src/foo.js']. This causes two concrete downstream problems:

    1. SQL pathbuildFileConditionSQL(['', 'src/foo.js']) generates AND (file LIKE ? ESCAPE '\' OR file LIKE ? ESCAPE '\') with params ['%%', '%src/foo.js%']. Because %% matches every row, the whole OR condition is always true and the file filter is silently discarded.

    2. owners.js path (lines 196–199) — the JavaScript-side filter fileFilters.some((filter) => f.includes(filter)) short-circuits on f.includes(''), which is always true in JavaScript, so every file passes the filter regardless of the other entries.

    This can be triggered today by codegraph owners --file "" --file src/foo.js (empty string first argument via Commander).

Last reviewed commit: 23db20b

Comment on lines +39 to +44
const isGlob =
opts.isGlob !== undefined ? opts.isGlob : opts.filePattern && /[*?[\]]/.test(opts.filePattern);
opts.isGlob !== undefined
? opts.isGlob
: fpArr.length > 0 && fpArr.some((p) => /[*?[\]]/.test(p));
if (isGlob) {
filtered = filtered.filter((row) => globMatch(row.file, opts.filePattern));
filtered = filtered.filter((row) => fpArr.some((p) => globMatch(row.file, p)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Mixed glob + non-glob patterns silently break non-glob substring matching

When a user mixes glob and non-glob patterns (e.g., --file src/shared --file "*.test.js"), isGlob becomes true because at least one pattern contains a glob character. As a result, ALL patterns are routed through globMatch, which performs anchored regex matching (regex is wrapped with ^...$).

This means the non-glob pattern src/shared becomes the regex ^src/shared$, which will not match src/shared/paginate.js. The user expects a substring/LIKE match for src/shared, but gets zero results for that pattern.

The same issue exists symmetrically in keyword.js and prepare.js — when isGlob is true, SQL pre-filtering is skipped for all patterns, and then applyFilters uses globMatch for everything.

A fix would be to apply LIKE filtering for non-glob entries independently of glob filtering:

if (isGlob) {
  const globPatterns = fpArr.filter((p) => /[*?[\]]/.test(p));
  const likePatterns = fpArr.filter((p) => !/[*?[\]]/.test(p));
  filtered = filtered.filter((row) =>
    globPatterns.some((p) => globMatch(row.file, p)) ||
    likePatterns.some((p) => row.file.includes(p))
  );
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in aff787c — Changed applyFilters to check isGlob per-pattern instead of for the whole array. Each pattern now individually decides whether to use globMatch (for glob patterns with *?[]) or String.includes() (for plain substring patterns). This prevents non-glob patterns from silently breaking when mixed with glob patterns.

Comment on lines +94 to +108
export function buildFileConditionSQL(file, column = 'file') {
const files = normalizeFileFilter(file);
if (files.length === 0) return { sql: '', params: [] };
if (files.length === 1) {
return {
sql: ` AND ${column} LIKE ? ESCAPE '\\'`,
params: [`%${escapeLike(files[0])}%`],
};
}
const clauses = files.map(() => `${column} LIKE ? ESCAPE '\\'`);
return {
sql: ` AND (${clauses.join(' OR ')})`,
params: files.map((f) => `%${escapeLike(f)}%`),
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 column parameter not validated before SQL interpolation

buildFileConditionSQL interpolates the column argument directly into SQL without calling validateColumn(), unlike the existing testFilterSQL which validates:

export function testFilterSQL(column = 'n.file', enabled = true) {
  if (!enabled) return '';
  validateColumn(column);  // ← validates before interpolation
  ...
}

While all current call sites pass hardcoded strings, adding validation here keeps the security model consistent and guards against future callers passing dynamic column names:

Suggested change
export function buildFileConditionSQL(file, column = 'file') {
const files = normalizeFileFilter(file);
if (files.length === 0) return { sql: '', params: [] };
if (files.length === 1) {
return {
sql: ` AND ${column} LIKE ? ESCAPE '\\'`,
params: [`%${escapeLike(files[0])}%`],
};
}
const clauses = files.map(() => `${column} LIKE ? ESCAPE '\\'`);
return {
sql: ` AND (${clauses.join(' OR ')})`,
params: files.map((f) => `%${escapeLike(f)}%`),
};
}
export function buildFileConditionSQL(file, column = 'file') {
validateColumn(column);
const files = normalizeFileFilter(file);
if (files.length === 0) return { sql: '', params: [] };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f6a0766 — Added validateColumn(column) call at the start of buildFileConditionSQL, matching the validation pattern already used by testFilterSQL. This prevents unvalidated column names from being interpolated into SQL.

Comment on lines +42 to +49
if (fpArr.length > 0 && !isGlob) {
if (fpArr.length === 1) {
sql += ' AND n.file LIKE ?';
params.push(`%${fpArr[0]}%`);
} else {
sql += ` AND (${fpArr.map(() => 'n.file LIKE ?').join(' OR ')})`;
params.push(...fpArr.map((f) => `%${f}%`));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inline LIKE filter missing ESCAPE clause and wildcard escaping

This file (and the parallel code in prepare.js lines 49–56) has its own inline LIKE implementation instead of reusing buildFileConditionSQL. The inline version doesn't call escapeLike() and omits the ESCAPE '\\' clause, unlike buildFileConditionSQL:

// keyword.js / prepare.js — no escaping, no ESCAPE clause
sql += ` AND (${fpArr.map(() => 'n.file LIKE ?').join(' OR ')})`;
params.push(...fpArr.map((f) => `%${f}%`));

// buildFileConditionSQL — consistent escaping
sql: ` AND (${clauses.join(' OR ')})`,  // uses ESCAPE '\\'
params: files.map((f) => `%${escapeLike(f)}%`),

A file pattern like my_utils.js (containing _) would unexpectedly match myXutils.js in the search command but not in other commands. Reusing buildFileConditionSQL here would close the gap:

import { buildFileConditionSQL } from '../../../db/query-builder.js';
// ...
const fc = buildFileConditionSQL(fpArr.length === 1 ? fpArr[0] : fpArr, 'n.file');
if (fc.sql) {
  sql += fc.sql;
  params.push(...fc.params);
}

The same fix applies to src/domain/search/search/prepare.js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f6edc13 — Refactored the inline LIKE filter in keyword.js to reuse \ instead of hand-rolling LIKE clauses. This gets proper \ and \ handling for free, eliminating the SQL injection risk from unescaped wildcards.

Comment on lines +29 to +33
// --file collects into an array; search supports single string filePattern,
// so pass first element for single value, or join for backward compat.
const fileArr = opts.file || [];
const filePattern =
fileArr.length === 1 ? fileArr[0] : fileArr.length > 1 ? fileArr : undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inaccurate comment — array is passed directly, not joined

The comment says "or join for backward compat", but the code actually passes the raw array when there are multiple values — it never calls .join(). The comment is misleading:

Suggested change
// --file collects into an array; search supports single string filePattern,
// so pass first element for single value, or join for backward compat.
const fileArr = opts.file || [];
const filePattern =
fileArr.length === 1 ? fileArr[0] : fileArr.length > 1 ? fileArr : undefined;
// --file collects into an array; pass the string directly for a single value,
// or the full array for multi-file filtering.
const fileArr = opts.file || [];
const filePattern =
fileArr.length === 1 ? fileArr[0] : fileArr.length > 1 ? fileArr : undefined;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bb07f13 — Updated the comment to accurately describe what the code does: pass single element unwrapped for single value, or pass the raw array for multi-file scoping.

Impact: 29 functions changed, 178 affected
When a user mixes glob and non-glob patterns, the previous code used a
single isGlob flag for the whole array. This caused plain substring
patterns to go through globMatch (anchored regex), silently breaking
non-glob matches. Now each pattern individually decides whether to use
glob matching or substring matching.

Impact: 1 functions changed, 1 affected
Add validateColumn() call at the start of buildFileConditionSQL to
prevent unvalidated SQL interpolation, matching the pattern used by
testFilterSQL.

Impact: 1 functions changed, 1 affected
…ping

The inline LIKE filter was missing escapeLike() and the ESCAPE '\'
clause. Refactored to reuse buildFileConditionSQL which handles both
correctly.

Impact: 1 functions changed, 0 affected
The comment said "join for backward compat" but the code actually passes
the raw array for multi-file scoping. Updated to match reality.

Impact: 1 functions changed, 0 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +52 to 60
if (fpArr.length > 0 && !isGlob) {
if (fpArr.length === 1) {
conditions.push('n.file LIKE ?');
params.push(`%${fpArr[0]}%`);
} else {
conditions.push(`(${fpArr.map(() => 'n.file LIKE ?').join(' OR ')})`);
params.push(...fpArr.map((f) => `%${f}%`));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing escapeLike() and ESCAPE clause (inconsistent with keyword.js)

The inline LIKE filter in prepare.js does not call escapeLike() on the file patterns and omits the ESCAPE '\\' clause. This is exactly the same issue that was fixed for keyword.js in this PR (which now delegates to buildFileConditionSQL), but prepare.js was not given the same treatment.

Concrete impact: if a user passes --file src/my_utils.js, the underscore is a SQL LIKE wildcard and will match src/myXutils.js. Similarly, a % in a path would match anything.

The fix is the same pattern already used in keyword.js:

Suggested change
if (fpArr.length > 0 && !isGlob) {
if (fpArr.length === 1) {
conditions.push('n.file LIKE ?');
params.push(`%${fpArr[0]}%`);
} else {
conditions.push(`(${fpArr.map(() => 'n.file LIKE ?').join(' OR ')})`);
params.push(...fpArr.map((f) => `%${f}%`));
}
}
if (fpArr.length > 0 && !isGlob) {
const fc = buildFileConditionSQL(fpArr, 'n.file');
conditions.push(fc.sql.replace(/^ AND /, ''));
params.push(...fc.params);
}

You'll also need to add the import at the top of the file:

import { buildFileConditionSQL } from '../../../db/query-builder.js';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8b98e40 — imported \ from \ and added \ clause to both the single-file and multi-file LIKE conditions in , matching the pattern already used in .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed — prepare.js now imports escapeLike from query-builder.js (line 2) and uses it with ESCAPE '\' clauses on lines 55-59. This was addressed in the same batch of fixes as the keyword.js fix.

@carlos-alm
Copy link
Contributor Author

@greptileai

1 similar comment
@carlos-alm
Copy link
Contributor Author

@greptileai

Stop passing a pre-computed isGlob boolean to applyFilters — it
applied one flag to all patterns, so mixed glob + non-glob patterns
(e.g. --file "*.js" --file "utils") would treat non-glob patterns
as globs. Now each pattern is individually checked for glob chars.

Impact: 3 functions changed, 2 affected
@carlos-alm
Copy link
Contributor Author

Addressed Greptile's outside-diff P1 about mixed glob/non-glob isGlob override:

Fixed in 23db20b — stopped passing the pre-computed isGlob boolean to applyFilters() in both keyword.js and prepare.js. The applyFilters function now does per-pattern glob detection (tests each pattern individually with /[*?[\]]/.test(p)), so mixed --file "*.js" --file "utils" correctly applies glob matching to the first and substring matching to the second.

@greptileai

Impact: 225 functions changed, 103 affected
Impact: 124 functions changed, 57 affected
@carlos-alm carlos-alm merged commit f5df470 into main Mar 17, 2026
14 checks passed
@carlos-alm carlos-alm deleted the fix/483-repeated-file-flag branch March 17, 2026 13:30
@github-actions github-actions bot locked and limited conversation to collaborators Mar 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: repeated --file flag silently drops all but last value

1 participant