Skip to content

Conversation

@EquationWalker
Copy link
Contributor

In FSDP2, the model(FSDPModule) does not have no_sync() and instead calls model.set_requires_gradient_sync(False) to turn off gradient synchronization. See at torch.distributed.fsdp.FSDPModule.set_requires_gradient_sync

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@S1ro1 S1ro1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@S1ro1 S1ro1 merged commit ec92b1a into huggingface:main Sep 6, 2025
25 checks passed
@bghira
Copy link

bghira commented Nov 7, 2025

@EquationWalker this causes surfaces a possible user-side bug:

 'list' object has no attribute 'set_requires_gradient_sync' 

did not happen before because everything inside the function silently fails through when whatever they're looking for isn't there - see the lines for DeepSpeed ZeRO 3 check earlier where it uses getattr with a default that then returns a null context.

not sure that it helps anyone to know this, or if some earlier check should occur when a user passes an unpacked list to accumulate()

@EquationWalker
Copy link
Contributor Author

EquationWalker commented Dec 20, 2025

@EquationWalker this causes surfaces a possible user-side bug:

 'list' object has no attribute 'set_requires_gradient_sync' 

did not happen before because everything inside the function silently fails through when whatever they're looking for isn't there - see the lines for DeepSpeed ZeRO 3 check earlier where it uses getattr with a default that then returns a null context.

not sure that it helps anyone to know this, or if some earlier check should occur when a user passes an unpacked list to accumulate()

I think this check should occur in Accelerator.accmulate(). If the user passes a list object and we use nullcontext, the user will not know that gradient synchronization is not actually turned off. Instead, we should unpack its list object in Accelerator.accumulate() or throw an Expection when user passed a list object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants