unpack-trees: skip lstats for deleted VFS entries in checkout#865
unpack-trees: skip lstats for deleted VFS entries in checkout#865tyrielv wants to merge 1 commit intomicrosoft:vfs-2.53.0from
Conversation
When core_virtualfilesystem is set and a branch switch deletes entries (present in old tree, absent in new tree), deleted_entry() calls verify_absent_if_directory() with 'ce' pointing to a tree entry from traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because that flag is only set on src_index entries by mark_new_skip_worktree(). The missing flag causes verify_absent_if_directory()'s fast-path to fail, falling through to verify_absent_1() which lstats every such path. In a VFS repo each lstat may trigger callbacks, creating placeholders. On a large repo switching between LTS releases this produces tens of thousands of placeholders that the VFS must then clean up when they are deleted as part of the checkout. Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old) to the tree entry (ce) when core_virtualfilesystem is set. This allows the existing fast-path to work, eliminating the unnecessary lstats entirely. This is safe in VFS mode because the virtual filesystem is responsible for tracking which files are hydrated and cleaning up placeholders when entries are removed from the index. Additionally, when GVFS_NO_DELETE_OUTSIDE_SPARSECHECKOUT is set (always the case in VFS repos), deleted_entry() preserves CE_SKIP_WORKTREE on the CE_REMOVE entry and git does not unlink skip-worktree files from disk, so the lstat result would not be acted upon anyway. Measured on a 2.8M file VFS repo (0% hydrated): Before: ~135s checkout, ~23k folder placeholders created After: ~25s checkout, 0 folder placeholders created Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
derrickstolee
left a comment
There was a problem hiding this comment.
Good find! I think your code is correct. I have a suggested restructuring that I think will be equivalent, but I'm guessing you wrote it that way on purpose. I'm curious to hear what you think.
| return 0; | ||
| } else if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o)) { | ||
| return -1; | ||
| } else { |
There was a problem hiding this comment.
As an experiment, I wanted to see what changed if we wrote it like this:
} else if (core_virtualfilesystem &&
old->ce_flags & CE_NEW_SKIP_WORKTREE) {
((struct cache_entry *)ce)->ce_flags |= CE_NEW_SKIP_WORKTREE;
} else if (verify_absent_if_directory(ce, ERROR_WOULD_LOSE_UNTRACKED_REMOVED, o)) {
return -1;
}This takes the guesswork out of it in terms of whether this is effective. But the key difference lies inside the implementation of verify_absent_if_directory() which only short-circuits on CE_NEW_SKIP_WORKTREE if o->skip_sparse_checkout is unset. I wonder if there is a world where core_virtualfilesystem and o->skip_sparse_checkout could ever be enabled at the same time.
There was a problem hiding this comment.
I think you're right - it looks like the only current scenario where skip_sparse_checkout could be false when core_virtualfilesystem is true is if validation of the sparse checkout patterns fails, and we're already in trouble in that case.
I'm not sure whether it's necessary to still propagate the flag if we skip verify_absent_if_directory entirely. I'll look into it, but I'd also be fine if this merged in the meantime. (I don't have permissions to merge).
@dscho what do you think?
When core_virtualfilesystem is set and a branch switch deletes entries (present in old tree, absent in new tree), deleted_entry() calls verify_absent_if_directory() with 'ce' pointing to a tree entry from traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because that flag is only set on src_index entries by mark_new_skip_worktree().
The missing flag causes verify_absent_if_directory()'s fast-path to fail, falling through to verify_absent_1() which lstats every such path. In a VFS repo each lstat may trigger callbacks, creating placeholders. On a large repo switching between LTS releases this produces tens of thousands of placeholders that the VFS must then clean up when they are deleted as part of the checkout.
Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old) to the tree entry (ce) when core_virtualfilesystem is set. This allows the existing fast-path to work, eliminating the unnecessary lstats entirely.
Measured on a 2.8M file VFS repo (0% hydrated):
Before: ~135s checkout, ~23k folder placeholders created
After: ~25s checkout, 0 folder placeholders created