Skip to content

Conversation

@kaitj
Copy link
Contributor

@kaitj kaitj commented Jun 11, 2025

Resolves #55

The path.glob that was used didn't follow directories that were symlinked. In py3.13, recurse_symlinks argument was introduced, but this is still an issue in py3.11 and py3.12. This introduces a method that performs similarly to py3.13 to recurse symlinks as well via os.walk. From what I was able to figure out, this was only needed when indexing a subject directory. I also added a (I admit messy) pytest fixture for testing indexing with symlinks.

I wonder if there is a better fix / way to do this. Turns out yes...the changes stem from trying to unify pathlib.path.glob with glob.glob (see python/cpython#117589). For now, this is falling back on glob.glob with a recursive pattern to support <py3.13.

@nx10 - mind taking a look? My brain is doing circles trying to sort out this symlinking stuff.

kaitj added 2 commits June 10, 2025 14:11
The current implementation did not follow symlink directories, only files. The built-in flag
'recurse_symlinks' was not introduced until py3.13. This commit adds a method for safely recursing directories for local paths
(assumes we don't have to worry about symlinks for cloudpaths)

Wil need to add some proper tests, but interactive testing suggests this fix works. Eventually, this method can be dropped in favor of
path.rglob(*, recurse_symlinks=True) when only py3.13+ is supported.
@kaitj kaitj requested a review from nx10 June 11, 2025 15:17
@kaitj kaitj self-assigned this Jun 11, 2025
@kaitj kaitj changed the title Recursive globbing Follow symlinks recursively Jun 11, 2025
@codecov
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.15%. Comparing base (79c824e) to head (1f77700).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   95.07%   95.15%   +0.07%     
==========================================
  Files           7        7              
  Lines         487      495       +8     
==========================================
+ Hits          463      471       +8     
  Misses         24       24              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kaitj kaitj force-pushed the maint/rglob branch 4 times, most recently from 3380af7 to f411c9c Compare June 11, 2025 20:09
@pytest.fixture
def symlink_dataset(tmp_path: Path, sub: str = "sub-001", ses: str = "ses-001") -> Path:
data = tmp_path / "data"
data.mkdir()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be cleaned up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "cleaned up" do you mean the code for the fixture or like the directory itself?

- Use built-in method for ClouthPath's since `glob.glob` does not properly recurse through
sub-directories. If it is a local path, just fall back to `glob.glob`. Alternatively, could
just check for Python version and only run this if <py3.13.
- The built-in rglob methods have slightly different parameters so need different calls for them.
@kaitj kaitj requested a review from nx10 July 22, 2025 18:35
@kaitj
Copy link
Contributor Author

kaitj commented Aug 13, 2025

@clane9, wondering if you have any thoughts about this before I merge? Might be misremembering, but I think we might have chatted about this briefly at some point.

@clane9
Copy link
Collaborator

clane9 commented Aug 14, 2025

looks fine to me!

@kaitj kaitj merged commit f3d111c into childmindresearch:main Aug 14, 2025
6 checks passed
@kaitj kaitj deleted the maint/rglob branch August 14, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Symlink directory following

3 participants