Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Feb 9, 2026

Related GitHub Issue

Closes: #11340

Description

This PR attempts to address Issue #11340. Feedback and guidance are welcome.

Replaces the ripgrep-based getNestedGitRepository() with a recursive fs.readdir-based directory walk (findNestedGitEntry()) that fixes three correctness issues:

  1. False negatives (submodule/worktree .git files): The old approach searched only for **/.git/HEAD, which misses submodules and worktrees that use a .git file (containing gitdir: ...) instead of a .git directory. The new approach checks for any .git entry -- directory or file.

  2. False positives via symlink follow: The old approach used ripgrep's --follow flag which traversed symlinks outside the workspace boundary. The new approach uses fs.readdir with withFileTypes (lstat semantics), so symbolic links are never followed.

  3. Brittle path matching: The old approach relied on includes(".git/HEAD") and startsWith(".git/") string matching. The new approach uses the Dirent.name property to check for .git entries directly, and uses isRoot flag to skip the root-level .git.

Key design decisions:

  • Skips .git directories and node_modules during traversal for performance
  • Short-circuits on first nested match (no need to enumerate all)
  • Removes the executeRipgrep dependency from the checkpoint service

Test Procedure

  • Existing tests: All 33 existing tests continue to pass (no mocking of ripgrep needed anymore since the new approach uses real filesystem operations)
  • New test: .git pointer file detection: Creates a directory with a .git file containing gitdir: ... (simulating a submodule) and verifies it is detected as a nested repo
  • New test: symlink non-traversal: Creates a symlink inside the workspace pointing to an external directory with a .git repo, and verifies the symlink is not followed

Run tests with:

cd src && npx vitest run services/checkpoints/__tests__/ShadowCheckpointService.spec.ts

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Documentation Updates

  • No documentation updates are required.

Additional Notes

The new implementation walks the directory tree using Node's built-in fs.readdir with { withFileTypes: true }. This uses readdir(3) which returns DT_LNK for symlinks, so Dirent.isSymbolicLink() correctly identifies them without following. This makes the implementation both simpler and more correct than the previous ripgrep-based approach.


Important

Replaces ripgrep-based nested git detection with a recursive directory walk in ShadowCheckpointService.ts, improving accuracy and handling submodules, worktrees, and symlinks correctly.

  • Behavior:
    • Replaces getNestedGitRepository() with findNestedGitEntry() in ShadowCheckpointService.ts to detect .git directories and files, handling submodules and worktrees.
    • Avoids following symlinks by using fs.readdir with withFileTypes.
    • Skips .git directories and node_modules for performance.
    • Short-circuits on first nested match.
  • Tests:
    • Adds tests for .git file detection, symlink non-traversal, and stray .git file handling in ShadowCheckpointService.spec.ts.
    • Verifies no false positives for stray .git files and correct behavior with symlinks.
  • Misc:
    • Removes executeRipgrep dependency from ShadowCheckpointService.ts.

This description was created by Ellipsis for c042beb. You can customize this summary. It will automatically update as commits are pushed.

…ymlink follow

Replace ripgrep-based getNestedGitRepository() with a recursive fs.readdir walk
that:
- Detects both .git directories (standard repos) and .git files (submodule/
  worktree pointer files containing gitdir: ...)
- Uses lstat semantics (withFileTypes) so symlinks are never followed, preventing
  false positives from symlinks pointing outside the workspace
- Uses proper path operations instead of fragile string matching
- Skips .git directories and node_modules for performance
- Short-circuits on first nested match

Fixes #11340
@roomote
Copy link
Contributor Author

roomote bot commented Feb 9, 2026

Rooviewer Clock   See task

Reviewed all changes including the new gitdir: content validation for .git files. No issues found. The validation correctly handles .git directories, symlinks, valid submodule/worktree pointer files, and stray .git files. Test coverage is adequate.

  • No issues to resolve
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@0xMink
Copy link
Contributor

0xMink commented Feb 9, 2026

@roomote Thanks for taking a look.

For context: I opened PR #11341 earlier today, which addresses the three problems described in this issue (submodule/worktree .git pointer files, --follow symlink false-positives, and brittle path classification) and adds targeted coverage (37/37 passing).

I reviewed PR #11344. It is a reasonable alternative direction, but I see two risks relative to the issue goals:

  1. .git file validation
    PR fix: checkpoint nested-repo detection handles .git files and avoids symlink follow #11344 flags any entry named .git as a nested repo. Git submodules/worktrees use a .git file containing a gitdir: pointer, but a stray .git file without gitdir: would become a false positive that disables checkpoints. PR fix(checkpoints): detect nested repos via .git/HEAD and .git pointer files #11341 validates gitdir: before treating a .git file as a repo marker.

  2. Scope and traversal cost
    Replacing the existing ripgrep-based detection with a recursive fs.readdir walk is a broader behavioral change than necessary for this bug, and it increases the risk of slower traversal in large workspaces unless ignore semantics and skip rules are carefully replicated.

If maintainers prefer the filesystem-walk approach, I recommend adding gitdir: content validation for .git files and a regression test for a non-pointer .git file to prevent false positives. I would appreciate review on PR #11341 as well.

@roomote
Copy link
Contributor Author

roomote bot commented Feb 9, 2026

Fixaroo Clock   See task

Added gitdir: content validation for .git files to prevent false positives from stray .git files, plus a regression test. All 34 tests pass.

View commit | Revert commit

…ing as nested repo

Addresses feedback from #11340: .git files without gitdir: content
(e.g. stray files) are no longer false-positived as nested repos.

- findNestedGitEntry now checks .git file content for gitdir: prefix
- .git directories are still always treated as nested repos
- Adds regression test for stray .git file without gitdir: pointer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Checkpoints nested-repo detection misses submodule/worktree .git files and can false-positive via symlink follow

2 participants