Skip to content

fix: make embed command respect .codegraphrc.json model config#96

Merged
carlos-alm merged 3 commits intomainfrom
chore/bge-large-config
Feb 25, 2026
Merged

fix: make embed command respect .codegraphrc.json model config#96
carlos-alm merged 3 commits intomainfrom
chore/bge-large-config

Conversation

@carlos-alm
Copy link
Contributor

@carlos-alm carlos-alm commented Feb 25, 2026

Summary

  • Fixes the embed command to read config.embeddings.model from .codegraphrc.json instead of hardcoding the default
  • The model resolution chain is now: CLI -m flag > config.embeddings.model > DEFAULT_MODEL
  • Changes DEFAULT_MODEL in embedder.js from 'minilm' to 'nomic-v1.5' to match the intended default
  • Updates the models command to show the configured default (not hardcoded)
  • Adds .codegraphrc.json setting bge-large as the local embedding model for codegraph self-analysis

Test plan

  • loadConfig('.') returns embeddings.model: 'bge-large' from .codegraphrc.json
  • Config unit tests pass (31/31)
  • Lint passes

Sets Xenova/bge-large-en-v1.5 (1024d) as the default embedding model
for codegraph self-analysis, replacing the default minilm.
@claude
Copy link

claude bot commented Feb 25, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Configure bge-large as default embedding model

Todo List:

  • Read the changed file to understand the configuration
  • Apply Phase 1: Root Cause Analysis
  • Apply Phase 2: Configuration Change Detection
  • Apply Phase 3: Test Integrity Check
  • Apply Phase 4: Code Quality Assessment
  • Provide structured final assessment with rating

Phase 1: Root Cause Analysis

Problem Being Solved:

  • The codebase has conflicting defaults: config.js line 23 defaults to nomic-v1.5 (gated) while embedder.js line 101 uses minilm as DEFAULT_MODEL
  • CLI loads config at line 41 but the embed command ignores it—uses hardcoded CLI default 'minilm' (line 446)
  • This PR attempts to establish bge-large as the default for better semantic search quality (1024d vs 384d)

Why This Approach: ❌ FUNDAMENTALLY FLAWED

  • This .codegraphrc.json file has ZERO effect because CLI ignores config.embeddings.model entirely
  • The embed command will still default to minilm, not bge-large
  • This is cosmetic configuration that doesn't actually change behavior

Phase 2: Configuration Change Detection (🚨 CRITICAL ISSUES)

SEVERITY: HIGH-RISK COSMETIC CHANGE

  • Silent failure: Configuration appears to work but has no effect
  • Misleading: Users will expect bge-large as default but get minilm
  • Root cause ignored: CLI should use config.embeddings.model || DEFAULT_MODEL but doesn't
  • Inconsistent behavior: Different parts of codebase have different defaults

Phase 3: Test Integrity ✅

  • No test changes (good)
  • Tests correctly show explicit model passing: buildEmbeddings(tmpDir, 'minilm', dbPath)
  • Config test expects DEFAULTS.embeddings.model: 'nomic-v1.5' which conflicts with this change

Phase 4: Code Quality

  • File itself is syntactically correct
  • However: This change creates more inconsistency rather than fixing the underlying problem
  • Missing implementation: CLI needs to actually use the config value

Root Cause Analysis

Problem Being Solved: Mismatch between intended default (better quality model) and actual default (minilm)
Why This Approach: ❌ WRONG - Config file is ignored by CLI implementation
Risk Assessment: HIGH - Silent misconfiguration, misleading to users

Critical Concerns

  1. 🚨 BROKEN BY DESIGN: CLI hardcodes 'minilm' default, ignores config entirely
  2. Silent failure: Users will be confused why their config doesn't work
  3. Technical debt: Adds another inconsistent default instead of fixing the real issue
  4. Test mismatch: Config tests expect nomic-v1.5 default, not bge-large

Final Recommendation

  • Rating: ⭐☆☆☆☆ (1/5)
  • Action: REJECT - Fix the underlying CLI issue instead
  • Reasoning: This change is cosmetic window-dressing that doesn't solve the problem and will confuse users. The correct fix is to make the CLI respect config.embeddings.model defaults.

Suggested Fix: Modify CLI to use config.embeddings.model || 'minilm' as the default instead of hardcoding 'minilm'.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 25, 2026

Greptile Summary

This PR configures bge-large (1024d) as the default embedding model for codegraph's self-analysis via a new .codegraphrc.json file, replacing the previous implicit default of nomic-v1.5 (768d). The change also corrects the hardcoded DEFAULTS in src/config.js from nomic-v1.5 to minilm, which aligns with the documented default behavior where minilm is the fallback when no config file exists.

Key changes:

  • Added .codegraphrc.json with embeddings.model: "bge-large" to set project-specific default
  • Fixed DEFAULTS.embeddings.model in src/config.js from nomic-v1.5minilm (correct fallback)
  • Updated models command to show config-based default with (default) marker
  • Updated embed command to respect config file setting when --model flag is not provided
  • Variable rename in cli.js (configcfg) to avoid shadowing the imported config object
  • Updated test expectations to match new DEFAULTS

Confidence Score: 5/5

  • Safe to merge - clean config change with proper fallback handling and test coverage
  • All changes are well-contained configuration updates with proper testing. The logic correctly implements a three-tier default system (CLI flag → config file → DEFAULTS), the variable rename prevents shadowing, and tests pass per PR description
  • No files require special attention

Important Files Changed

Filename Overview
.codegraphrc.json New config file setting bge-large as default embedding model
src/config.js Changed DEFAULTS from nomic-v1.5 to minilm to align with documented defaults
src/cli.js Updated models and embed commands to respect config file default model
tests/unit/config.test.js Updated test to expect minilm as default instead of nomic-v1.5

Last reviewed commit: 77ffffc

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

The CLI embed command hardcoded 'minilm' as the default model via
Commander, ignoring .codegraphrc.json config entirely. Now the embed
command reads config.embeddings.model as the default when no -m flag
is passed. Also fixes DEFAULTS.embeddings.model from 'nomic-v1.5' to
'minilm' to match the actual fallback used by the embedder, and updates
the models command to show the configured default.
@claude
Copy link

claude bot commented Feb 25, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Configure bge-large as default embedding model

Todo List:

  • Read the changed files to understand the configuration changes
  • Apply Phase 1: Root Cause Analysis
  • Apply Phase 2: Configuration Change Detection
  • Apply Phase 3: Test Integrity Check
  • Apply Phase 4: Code Quality Assessment
  • Provide structured final assessment with rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved:

  • Original issue: CLI hardcoded 'minilm' default, completely ignoring config files
  • This PR has TWO commits:
    1. c21c387: Added .codegraphrc.json with bge-large (the cosmetic change)
    2. 77ffffc: FIXED THE ACTUAL BUG - CLI now respects config.embeddings.model

Why This Approach:CORRECT NOW

  • The latest commit (77ffffc) fixed the CLI to use: opts.model || config.embeddings?.model || 'minilm'
  • The .codegraphrc.json now has actual effect, setting bge-large as the project default
  • Also corrected DEFAULTS.embeddings.model from inconsistent 'nomic-v1.5' to 'minilm'

Phase 2: Configuration Change Detection ✅

SEVERITY: LOW-RISK BENEFICIAL CHANGE

  • Fixed broken functionality: CLI now properly reads config files
  • Consistent defaults: Removed nomic-v1.5 vs minilm inconsistency
  • Better quality: bge-large (1024d) > minilm (384d) for semantic search
  • Proper fallback chain: CLI flag → config file → hardcoded default

Phase 3: Test Integrity ✅

  • Test updated correctly: config.test.js now expects model: 'minilm' instead of 'nomic-v1.5'
  • Maintains coverage: No test weakening, just fixing incorrect expectation
  • No functional test changes: Integration tests still pass with explicit model parameters

Phase 4: Code Quality ✅

  • Proper implementation: Uses optional chaining config.embeddings?.model
  • Correct precedence: Command flag > config > default
  • Clean syntax: Valid JSON configuration
  • Documentation updated: Help text now mentions "default from config"

Root Cause Analysis

Problem Being Solved: CLI embed command ignored configuration files entirely, causing confusion and preventing users from setting project-wide defaults
Why This Approach: ✅ CORRECT - Fixed the underlying CLI bug AND provided a better default model
Risk Assessment: LOW - Solid implementation with proper fallback chain

Critical Concerns

NONE - This PR actually fixes a legitimate bug while improving default model quality.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning:
    1. Fixes real bug: CLI now properly respects configuration files
    2. Improves quality: bge-large provides better embeddings than minilm
    3. Clean implementation: Proper fallback chain with optional chaining
    4. Consistent codebase: Removes conflicting default values
    5. Well-tested: Updated test expectations correctly

Note: The previous Claude review was analyzing only the first commit and missed that a follow-up commit actually implemented the CLI fix. This PR is now complete and functional.


@carlos-alm
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Change DEFAULT_MODEL in embedder.js from 'minilm' to 'nomic-v1.5' to
match the intended default. Import DEFAULT_MODEL in cli.js instead of
hardcoding strings. The embed command now resolves the model as:
CLI flag > config.embeddings.model > DEFAULT_MODEL.

Restores config.js DEFAULTS.embeddings.model to 'nomic-v1.5' (was
incorrectly changed in prior commit).
@claude
Copy link

claude bot commented Feb 25, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Configure bge-large as default embedding model

Todo List:

  • Read the changed files to understand the configuration changes
  • Apply Phase 1: Root Cause Analysis
  • Apply Phase 2: Configuration Change Detection
  • Apply Phase 3: Test Integrity Check
  • Apply Phase 4: Code Quality Assessment
  • Provide structured final assessment with rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved:
The PR addresses a real configuration inconsistency in the embedding system:

  1. CLI Command Issue: The embed command hardcoded 'minilm' as default, completely ignoring config files (config.embeddings.model)
  2. Inconsistent Defaults: DEFAULT_MODEL was set to 'minilm' in embedder.js:101 but the system was designed to use nomic-v1.5 as the actual default
  3. Configuration Ignored: Users couldn't set project-wide embedding defaults via .codegraphrc.json

Why This Approach:JUSTIFIED

  • Fixed CLI Bug: Commit 77ffffc properly implements: opts.model || config.embeddings?.model || DEFAULT_MODEL
  • Corrected Default: Changed DEFAULT_MODEL from 'minilm''nomic-v1.5' to match intended behavior
  • Better Model Choice: Project now uses bge-large (1024d) for higher quality embeddings vs minilm (384d)

Phase 2: Configuration Change Detection ✅

SEVERITY: LOW-RISK BENEFICIAL CHANGE

  • No rules relaxed: No linting, test timeout, or build configuration weakening
  • Fixes broken functionality: CLI now respects config file hierarchy
  • Proper fallback chain: CLI flag → config file → DEFAULT_MODEL → hardcoded fallback
  • Variable name fix: Avoided variable shadowing by renaming configcfg in models loop

Phase 3: Test Integrity ✅

No test files modified - This is actually GOOD:

  • Tests use explicit model parameters: buildEmbeddings(tmpDir, 'minilm', dbPath)
  • No weakened assertions or reduced coverage
  • Integration tests remain unchanged and comprehensive
  • Configuration changes don't break existing test expectations

Phase 4: Code Quality ✅

Implementation Quality:

  • Safe optional chaining: config.embeddings?.model handles undefined gracefully
  • Correct precedence: CLI flag > config > DEFAULT_MODEL fallback
  • Clean code: Variable rename prevents shadowing, improved help text
  • Backward compatible: Existing usage patterns continue to work

Technical Implementation:

  • Single source of truth: DEFAULT_MODEL import centralizes default definition
  • Dynamic model display: models command shows actual default from config
  • Consistent behavior: Both CLI and programmatic usage respect config hierarchy

Root Cause Analysis

Problem Being Solved: CLI embed command completely ignored configuration files, causing user confusion and preventing project-wide embedding model defaults
Why This Approach: ✅ CORRECT - Fixes the underlying CLI configuration bug AND provides a better quality default model for the project
Risk Assessment: LOW - Clean implementation with proper fallback chain, no breaking changes

Critical Concerns

NONE - This PR addresses legitimate functionality gaps and improves code quality.

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning:
    1. Fixes Real Bug: CLI now properly respects configuration hierarchy as intended
    2. Improves Quality: bge-large provides significantly better embeddings (1024d vs 384d)
    3. Clean Implementation: Proper optional chaining, variable scoping, and fallback logic
    4. No Breaking Changes: Backward compatible with existing usage patterns
    5. Well-Architected: Single source of truth for defaults, consistent behavior across commands
    6. Security Safe: No command injection, SQL injection, or credential exposure risks

This is a textbook example of fixing broken functionality while improving system quality. The three-commit progression (config → CLI fix → default correction) shows careful iterative development.


@carlos-alm carlos-alm changed the title chore: configure bge-large as default embedding model fix: make embed command respect .codegraphrc.json model config Feb 25, 2026
@carlos-alm carlos-alm merged commit 832fa49 into main Feb 25, 2026
14 of 16 checks passed
@carlos-alm carlos-alm deleted the chore/bge-large-config branch February 25, 2026 17:18
carlos-alm added a commit that referenced this pull request Mar 20, 2026
The section description referenced src/graph/algorithms/leiden/ which
does not exist yet (pending PR #545). Item #96 was incorrectly marked
DONE when graphology is still a live dependency. Updated section to be
forward-looking and reverted #96 to BLOCKED on #545.
carlos-alm added a commit that referenced this pull request Mar 20, 2026
Once #545 merges, the README's "3 runtime dependencies" claim becomes
correct automatically — no further action needed. BLOCKED implied
someone still needs to act on the item after #545 lands.
carlos-alm added a commit that referenced this pull request Mar 20, 2026
* docs: add community leverage features to backlog (IDs 97-99)

Add three Leiden-powered features: weighted community labels (#97),
hierarchical decomposition (#98), and community-aware impact scoring
(#99). Mark ID 96 (README dep count) as done after graphology removal.
Update ID 54 to reference Leiden instead of Louvain.

* fix(backlog): renumber community items to IDs 100-102, fix Leiden text

IDs 97-99 conflicted with existing entry #97 (unified multi-repo graph).
Renumbered to 100-102. Also corrected "Leiden/Louvain" to "Leiden" in
the section description for consistency with the rest of the PR.

* docs: rename Tier 1e′ to Tier 1e.1 for ASCII compatibility

* docs: fix premature Leiden references in backlog

The section description referenced src/graph/algorithms/leiden/ which
does not exist yet (pending PR #545). Item #96 was incorrectly marked
DONE when graphology is still a live dependency. Updated section to be
forward-looking and reverted #96 to BLOCKED on #545.

* docs: add #545 dependency to community leverage items 100-102

* docs: add #545 dependency to item #54 for Leiden consistency

* docs: mark #96 as SUPERSEDED instead of BLOCKED

Once #545 merges, the README's "3 runtime dependencies" claim becomes
correct automatically — no further action needed. BLOCKED implied
someone still needs to act on the item after #545 lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant