MR branch for the remaining difference between dkorzekwa/any_model an…#1047
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…dapted from https://github.com/EleutherAI/lm-evaluation-harness Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…gs_from_dkorzekwa_anymodel_merging_process
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…gs_from_dkorzekwa_anymodel_merging_process Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…gs_from_dkorzekwa_anymodel_merging_process
…zekwa_anymodel_merging_process Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/utils/checkpoint_manager.py (1)
130-132:⚠️ Potential issue | 🟡 MinorExplicitly specify
weights_only=Trueintorch.load()for security guideline compliance.The project enforces
torch>=2.6which defaults toweights_only=True, making the code technically safe. However, the security coding guidelines require always explicitly setting this parameter. Change line 132 to:hook_states = torch.load(hook_states_path, map_location="cpu", weights_only=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/utils/checkpoint_manager.py` around lines 130 - 132, Update the torch.load call that reads hook_states_path in checkpoint_manager.py to explicitly pass weights_only=True for compliance; locate the torch.load(hook_states_path, map_location="cpu") invocation and change it to include weights_only=True so it becomes torch.load(hook_states_path, map_location="cpu", weights_only=True).modelopt/torch/puzzletron/mip/mip_and_realize_models.py (1)
41-73:⚠️ Potential issue | 🟡 MinorReturn
list[str]on every rank.On Line 52, worker ranks leave
solution_pathsasNonewhenevercfg.skip_realize_modelis true, so this function violates its new-> list[str]contract on that path. Broadcasting the paths outside the realize-only branch, or explicitly returning[]on workers, keeps the API consistent.💡 Minimal fix
if dist.is_master(): solution_paths = launch_mip(cfg) length_tensor = torch.tensor([len(solution_paths)], dtype=torch.long, device=device) else: - solution_paths = None + solution_paths = [] length_tensor = torch.tensor([0], dtype=torch.long, device=device) - if not cfg.skip_realize_model: - if dist.size() > 1: - torch.distributed.broadcast(length_tensor, src=0) - - list_length = length_tensor.item() - - if not dist.is_master(): - solution_paths = [None] * list_length - - if dist.size() > 1: - torch.distributed.broadcast_object_list(solution_paths, src=0) + if dist.size() > 1: + torch.distributed.broadcast(length_tensor, src=0) + if not dist.is_master(): + solution_paths = [None] * length_tensor.item() + torch.distributed.broadcast_object_list(solution_paths, src=0) + if not cfg.skip_realize_model: for solution_path in solution_paths: mprint(f"Realize model for the solution: {solution_path}") cfg.realize_model.solutions_path = Path(solution_path) launch_realize_model(cfg) dist.barrier()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/mip_and_realize_models.py` around lines 41 - 73, The function launch_mip_and_realize_model sometimes returns None on worker ranks when cfg.skip_realize_model is true; ensure it always returns a list[str]. Fix by moving the broadcast of length_tensor/solution_paths outside the cfg.skip_realize_model branch (so all ranks receive the master list), or if you prefer minimal change: after the initial master/non-master branch, set solution_paths = [] on non-master ranks when cfg.skip_realize_model is true (instead of leaving it None) so the return type is always list[str]; update any use of length_tensor/torch.distributed.broadcast_object_list accordingly to handle the broadcast-even-if-skipped path.
🧹 Nitpick comments (4)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (2)
119-128: Remove unreachable dead code.The
if True:condition makes theelsebranch (lines 126-128) unreachable. If SDPA is now unconditionally used whenllama4_attn_implementationisNone, simplify the method by removing the dead code.♻️ Proposed fix
def _choose_llama4_attn_implementation(self, llama4_attn_implementation): self.llama4_attn_implementation = llama4_attn_implementation if self.llama4_attn_implementation is None: - # if is_torch_sdpa_available(): - if True: - _print_once("auto-setting llama4_attn_implementation to sdpa") - self.llama4_attn_implementation = "sdpa" - else: - _print_once("auto-setting llama4_attn_implementation to eager") - self.llama4_attn_implementation = "eager" + _print_once("auto-setting llama4_attn_implementation to sdpa") + self.llama4_attn_implementation = "sdpa"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` around lines 119 - 128, The method _choose_llama4_attn_implementation contains an always-true branch making the else unreachable; remove the dead if/else and simplify so that when self.llama4_attn_implementation is None you unconditionally call _print_once("auto-setting llama4_attn_implementation to sdpa") and set self.llama4_attn_implementation = "sdpa". Ensure no leftover unreachable code remains and keep the initial assignment self.llama4_attn_implementation = llama4_attn_implementation.
23-23: Remove commented-out import.If
is_torch_sdpa_availableis no longer needed, remove the commented-out import rather than leaving it as dead code.♻️ Proposed fix
-from transformers.utils import is_flash_attn_2_available # , is_torch_sdpa_available +from transformers.utils import is_flash_attn_2_available🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` at line 23, Remove the dead commented-out import by deleting the trailing ", is_torch_sdpa_available" from the import line that currently reads "from transformers.utils import is_flash_attn_2_available # , is_torch_sdpa_available" so only is_flash_attn_2_available is imported; ensure no leftover comment remains and run the project formatter/linter to keep the import clean in configuration_decilm.py..pre-commit-config.yaml (1)
116-116: Remove the duplicated regex entry forlm_eval_anymodel.py.Line 116 repeats an existing exclusion already present at Line 111. Keeping only one avoids regex drift and confusion.
Suggested cleanup
- examples/puzzletron/evaluation/lm_eval_anymodel.py| modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py|🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.pre-commit-config.yaml at line 116, Remove the duplicate exclusion entry for "examples/puzzletron/evaluation/lm_eval_anymodel.py" from .pre-commit-config.yaml's list so the filename appears only once; locate the repeated regex string "examples/puzzletron/evaluation/lm_eval_anymodel.py" in the exclusions and delete the redundant occurrence, leaving a single entry to avoid duplicate patterns.tests/gpu/torch/conftest.py (1)
57-59: Verify that session-wideautouse=Trueis intentional.This fixture applies
mto.enable_huggingface_checkpointing()to all tests in the entire session, which globally patches HuggingFacePreTrainedModelclasses. This is a one-way operation (no teardown to restore original methods).If some tests under
tests/gpu/torch/should run without HF checkpointing enabled, consider:
- Removing
autouse=Trueand explicitly requesting the fixture where needed, or- Adding a comment confirming this global patching is intentional for all GPU tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/conftest.py` around lines 57 - 59, The fixture enable_hf_checkpointing currently has autouse=True and calls mto.enable_huggingface_checkpointing(), which globally patches HF PreTrainedModel without a teardown; either remove autouse=True so tests opt-in to enable_huggingface_checkpointing() or explicitly document this global behavior with a clear comment above the enable_hf_checkpointing fixture stating that a session-wide, permanent HF patch is intentional for all GPU/torch tests; update the fixture signature or add the comment accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/mip/sweep.py`:
- Around line 227-295: run_mip_sweep currently assumes realization ran and then
calls extract_solution_results which will fail when hydra_cfg.skip_realize_model
is true; add an explicit guard at the start of run_mip_sweep (or right before
calling mip_and_realize_models.launch_mip_and_realize_model) to reject/raise a
clear error if hydra_cfg.skip_realize_model is truthy, mentioning that sweep
mode requires realization, or alternatively skip extraction and CSV writing when
skip_realize_model is set; reference run_mip_sweep,
hydra_cfg.skip_realize_model,
mip_and_realize_models.launch_mip_and_realize_model, and
extract_solution_results to locate the change.
In `@modelopt/torch/puzzletron/utils/parsing.py`:
- Around line 153-155: The branch that formats KV-heads currently returns a
string for any non-None value of attention_config.num_key_value_heads
(num_kv_heads) and can emit "0 kv heads"; update the guard to require a positive
integer (e.g., num_kv_heads > 0) before returning f"{num_kv_heads} kv heads" so
that zero or negative sentinels are ignored and execution falls through to the
existing fallback formatting logic.
---
Outside diff comments:
In `@modelopt/torch/puzzletron/mip/mip_and_realize_models.py`:
- Around line 41-73: The function launch_mip_and_realize_model sometimes returns
None on worker ranks when cfg.skip_realize_model is true; ensure it always
returns a list[str]. Fix by moving the broadcast of length_tensor/solution_paths
outside the cfg.skip_realize_model branch (so all ranks receive the master
list), or if you prefer minimal change: after the initial master/non-master
branch, set solution_paths = [] on non-master ranks when cfg.skip_realize_model
is true (instead of leaving it None) so the return type is always list[str];
update any use of length_tensor/torch.distributed.broadcast_object_list
accordingly to handle the broadcast-even-if-skipped path.
In `@modelopt/torch/puzzletron/utils/checkpoint_manager.py`:
- Around line 130-132: Update the torch.load call that reads hook_states_path in
checkpoint_manager.py to explicitly pass weights_only=True for compliance;
locate the torch.load(hook_states_path, map_location="cpu") invocation and
change it to include weights_only=True so it becomes
torch.load(hook_states_path, map_location="cpu", weights_only=True).
---
Nitpick comments:
In @.pre-commit-config.yaml:
- Line 116: Remove the duplicate exclusion entry for
"examples/puzzletron/evaluation/lm_eval_anymodel.py" from
.pre-commit-config.yaml's list so the filename appears only once; locate the
repeated regex string "examples/puzzletron/evaluation/lm_eval_anymodel.py" in
the exclusions and delete the redundant occurrence, leaving a single entry to
avoid duplicate patterns.
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`:
- Around line 119-128: The method _choose_llama4_attn_implementation contains an
always-true branch making the else unreachable; remove the dead if/else and
simplify so that when self.llama4_attn_implementation is None you
unconditionally call _print_once("auto-setting llama4_attn_implementation to
sdpa") and set self.llama4_attn_implementation = "sdpa". Ensure no leftover
unreachable code remains and keep the initial assignment
self.llama4_attn_implementation = llama4_attn_implementation.
- Line 23: Remove the dead commented-out import by deleting the trailing ",
is_torch_sdpa_available" from the import line that currently reads "from
transformers.utils import is_flash_attn_2_available # ,
is_torch_sdpa_available" so only is_flash_attn_2_available is imported; ensure
no leftover comment remains and run the project formatter/linter to keep the
import clean in configuration_decilm.py.
In `@tests/gpu/torch/conftest.py`:
- Around line 57-59: The fixture enable_hf_checkpointing currently has
autouse=True and calls mto.enable_huggingface_checkpointing(), which globally
patches HF PreTrainedModel without a teardown; either remove autouse=True so
tests opt-in to enable_huggingface_checkpointing() or explicitly document this
global behavior with a clear comment above the enable_hf_checkpointing fixture
stating that a session-wide, permanent HF patch is intentional for all GPU/torch
tests; update the fixture signature or add the comment accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1baf11e6-f95a-4330-8768-72867ddb6e94
📒 Files selected for processing (9)
.pre-commit-config.yamlmodelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.pymodelopt/torch/puzzletron/mip/mip_and_realize_models.pymodelopt/torch/puzzletron/mip/sweep.pymodelopt/torch/puzzletron/utils/checkpoint_manager.pymodelopt/torch/puzzletron/utils/data/dataset.pymodelopt/torch/puzzletron/utils/parsing.pytests/_test_utils/torch/puzzletron/utils.pytests/gpu/torch/conftest.py
| def run_mip_sweep(hydra_cfg): | ||
| """Run MIP for multiple memory compression rates and generate CSV with results. | ||
|
|
||
| This function is called when mip.sweep.enabled is True in the config. | ||
|
|
||
| Args: | ||
| hydra_cfg: Hydra configuration object with mip.sweep settings | ||
| """ | ||
| mprint("=" * 80) | ||
| mprint("MIP Sweep Mode Enabled") | ||
| mprint("=" * 80) | ||
|
|
||
| # Get sweep configuration | ||
| sweep_cfg = hydra_cfg.mip.sweep | ||
| compression_rates = sweep_cfg.memory_compression_rates | ||
| output_csv = sweep_cfg.output_csv | ||
| puzzle_dir = Path(hydra_cfg.puzzle_dir) | ||
|
|
||
| mprint(f"Compression rates: {compression_rates}") | ||
| mprint(f"Output CSV: {output_csv}") | ||
| mprint(f"Puzzle directory: {puzzle_dir}") | ||
|
|
||
| # Calculate teacher memory from subblock_stats | ||
| teacher_memory = get_teacher_memory_from_subblock_stats(hydra_cfg) | ||
| mprint( | ||
| f"Teacher memory (from subblock_stats): {teacher_memory:.1f} MiB ({teacher_memory / 1024:.1f} GiB)" | ||
| ) | ||
|
|
||
| # Collect results | ||
| all_results = [] | ||
|
|
||
| # Run MIP for each compression rate | ||
| for compression_rate in compression_rates: | ||
| target_memory_mib = teacher_memory * compression_rate | ||
| mprint("\n" + "=" * 80) | ||
| mprint( | ||
| f"Running MIP for compression_rate={compression_rate:.2f} " | ||
| f"(target={target_memory_mib:.1f} MiB = {target_memory_mib / 1024:.1f} GiB)" | ||
| ) | ||
| mprint("=" * 80) | ||
|
|
||
| # Modify config dynamically | ||
| hydra_cfg.mip.human_constraints.target_memory = target_memory_mib | ||
|
|
||
| # Run MIP and realize models (reuse existing distributed logic!) | ||
| solution_paths = mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg) | ||
|
|
||
| # Extract results (only on master rank) | ||
| if dist.is_master(): | ||
| for solution_path in solution_paths: | ||
| result = extract_solution_results( | ||
| solution_path=Path(solution_path), | ||
| target_memory_mib=target_memory_mib, | ||
| teacher_memory_mib=teacher_memory, | ||
| compression_rate=compression_rate, | ||
| ) | ||
| all_results.append(result) | ||
|
|
||
| mprint( | ||
| f"✓ Results: actual_memory={result['actual_memory_mib']:.1f} MiB, " | ||
| f"lm_loss={result['lm_loss']:.4f}" | ||
| ) | ||
|
|
||
| # Write results to CSV (only on master rank) | ||
| if dist.is_master(): | ||
| mprint("\n" + "=" * 80) | ||
| mprint("MIP Sweep Complete - Writing Results") | ||
| mprint("=" * 80) | ||
| write_results_to_csv(all_results, output_csv) |
There was a problem hiding this comment.
Reject sweep mode when realization is disabled.
This function always extracts validation metrics after launch_mip_and_realize_model(), but that flow skips generating solutions--validation/solution_0.json when hydra_cfg.skip_realize_model is true. Right now the sweep will fail later with a FileNotFoundError instead of a clear config error.
💡 Minimal guard
def run_mip_sweep(hydra_cfg):
"""Run MIP for multiple memory compression rates and generate CSV with results.
@@
Args:
hydra_cfg: Hydra configuration object with mip.sweep settings
"""
+ if hydra_cfg.skip_realize_model:
+ raise ValueError(
+ "mip.sweep requires realization because it reads solutions--validation/solution_0.json."
+ )
+
mprint("=" * 80)
mprint("MIP Sweep Mode Enabled")
mprint("=" * 80)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 227 - 295, run_mip_sweep
currently assumes realization ran and then calls extract_solution_results which
will fail when hydra_cfg.skip_realize_model is true; add an explicit guard at
the start of run_mip_sweep (or right before calling
mip_and_realize_models.launch_mip_and_realize_model) to reject/raise a clear
error if hydra_cfg.skip_realize_model is truthy, mentioning that sweep mode
requires realization, or alternatively skip extraction and CSV writing when
skip_realize_model is set; reference run_mip_sweep,
hydra_cfg.skip_realize_model,
mip_and_realize_models.launch_mip_and_realize_model, and
extract_solution_results to locate the change.
| num_kv_heads = attention_config.num_key_value_heads | ||
| if num_kv_heads is not None: | ||
| return f"{num_kv_heads} kv heads" |
There was a problem hiding this comment.
Guard KV-head display against non-positive values.
This branch currently treats 0 (or any non-None sentinel) as valid and can render misleading output like 0 kv heads. Please require a positive value before formatting.
Suggested patch
- num_kv_heads = attention_config.num_key_value_heads
- if num_kv_heads is not None:
+ num_kv_heads = attention_config.num_key_value_heads
+ if num_kv_heads is not None and num_kv_heads > 0:
return f"{num_kv_heads} kv heads"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| num_kv_heads = attention_config.num_key_value_heads | |
| if num_kv_heads is not None: | |
| return f"{num_kv_heads} kv heads" | |
| num_kv_heads = attention_config.num_key_value_heads | |
| if num_kv_heads is not None and num_kv_heads > 0: | |
| return f"{num_kv_heads} kv heads" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/utils/parsing.py` around lines 153 - 155, The
branch that formats KV-heads currently returns a string for any non-None value
of attention_config.num_key_value_heads (num_kv_heads) and can emit "0 kv
heads"; update the guard to require a positive integer (e.g., num_kv_heads > 0)
before returning f"{num_kv_heads} kv heads" so that zero or negative sentinels
are ignored and execution falls through to the existing fallback formatting
logic.
There was a problem hiding this comment.
can we just keep the if part and remove else part?
| config.hidden_size = 256 | ||
| config.intermediate_size = 512 | ||
| config.num_hidden_layers = 2 | ||
| config.num_hidden_layers = max(2, dist.size()) |
There was a problem hiding this comment.
If num layers is either 1 or 2, does our assertions work for both case?
There was a problem hiding this comment.
I think this file can be removed as this content is already moved to tests/gpu/conftest.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1047 +/- ##
===================================================
Coverage 72.12% 72.12%
===================================================
Files 209 209
Lines 23628 23628
===================================================
Hits 17042 17042
Misses 6586 6586 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (1)
23-23: Clean up the commented import and document the unconditional SDPA default.The dependency floor (
torch>=2.6) guarantees SDPA availability, so the unconditional"sdpa"default in_choose_llama4_attn_implementation()is safe. However, remove the commented import fragment on line 23 and add an inline comment on lines 121–123 explaining that SDPA is the only supported auto mode given the minimum torch requirement. This prevents future confusion when revisiting this code.Suggested cleanup
-from transformers.utils import is_flash_attn_2_available # , is_torch_sdpa_available +from transformers.utils import is_flash_attn_2_available @@ self.llama4_attn_implementation = llama4_attn_implementation if self.llama4_attn_implementation is None: + # SDPA is guaranteed available with torch>=2.6 (minimum requirement). _print_once("auto-setting llama4_attn_implementation to sdpa") self.llama4_attn_implementation = "sdpa"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` at line 23, Remove the leftover commented import fragment ", is_torch_sdpa_available" from the top-level import line (currently: from transformers.utils import is_flash_attn_2_available # , is_torch_sdpa_available) and add a short inline comment inside the _choose_llama4_attn_implementation() function explaining that the unconditional "sdpa" default is safe because the package enforces torch>=2.6 (so SDPA support is guaranteed), to prevent future confusion about supported auto modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`:
- Line 23: Remove the leftover commented import fragment ",
is_torch_sdpa_available" from the top-level import line (currently: from
transformers.utils import is_flash_attn_2_available # ,
is_torch_sdpa_available) and add a short inline comment inside the
_choose_llama4_attn_implementation() function explaining that the unconditional
"sdpa" default is safe because the package enforces torch>=2.6 (so SDPA support
is guaranteed), to prevent future confusion about supported auto modes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dfe06589-a2f2-4264-b26a-3141fae72bb2
📒 Files selected for processing (1)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
The remaining difference between dkorzekwa/any_model and dkorzekwa/anymodel_mbridgedist
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores
Tests