[draft] Adds memory-efficient selective attention processor #12844
+288
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements a new attention processing technique based on the "Selective Attention Improves Transformer" paper, enabling more efficient and flexible attention mechanisms.
What does this PR do?
Key features:
Enables more intelligent and computationally efficient attention by allowing selective token interaction and pruning, which can improve model performance and reduce computational overhead.
The original paper targets causal attention in LLMs where benefits come from KV-cache eviction during autoregressive generation. For bidirectional attention in diffusion models, we observe overhead from computing selection scores without actual computation reduction. However, we expect better results for video generation where sequences are much longer (16K-100K+ tokens) and temporal redundancy provides more pruning opportunities.
As next step I am adapting it to video models to check the benefits. If it works, will work on the documentation and unit tests.
Fixes #12817
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. @sayakpaul @DN6 may be interested.