Skip to content

MR branch for the remaining difference between dkorzekwa/any_model an…#1047

Merged
danielkorzekwa merged 98 commits intofeature/puzzletronfrom
dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process
Mar 20, 2026
Merged

MR branch for the remaining difference between dkorzekwa/any_model an…#1047
danielkorzekwa merged 98 commits intofeature/puzzletronfrom
dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 16, 2026

What does this PR do?

The remaining difference between dkorzekwa/any_model and dkorzekwa/anymodel_mbridgedist

Summary by CodeRabbit

  • New Features

    • Added an end-to-end MIP sweep workflow that runs sweeps, collects metrics, and writes CSV reports.
  • Improvements

    • Simplified attention implementation selection and clarified attention display wording.
    • Realization pipeline now returns realized model solution paths.
    • Caching behavior adjusted for token ID retrieval.
    • Test model generation scales layer count with distributed size.
  • Bug Fixes

    • Checkpoint hook state restored using the correct load API.
  • Chores

    • Extended pre-commit license hook exclusion.
  • Tests

    • Added GPU/distributed test fixtures and dtype/checkout helpers.

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…gs_from_dkorzekwa_anymodel_merging_process

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Base automatically changed from dkorzekwa/anymodel_mbridgedist to feature/puzzletron March 20, 2026 11:43
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 20, 2026 11:43
…zekwa_anymodel_merging_process

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/utils/checkpoint_manager.py (1)

130-132: ⚠️ Potential issue | 🟡 Minor

Explicitly specify weights_only=True in torch.load() for security guideline compliance.

The project enforces torch>=2.6 which defaults to weights_only=True, making the code technically safe. However, the security coding guidelines require always explicitly setting this parameter. Change line 132 to:

hook_states = torch.load(hook_states_path, map_location="cpu", weights_only=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/utils/checkpoint_manager.py` around lines 130 -
132, Update the torch.load call that reads hook_states_path in
checkpoint_manager.py to explicitly pass weights_only=True for compliance;
locate the torch.load(hook_states_path, map_location="cpu") invocation and
change it to include weights_only=True so it becomes
torch.load(hook_states_path, map_location="cpu", weights_only=True).
modelopt/torch/puzzletron/mip/mip_and_realize_models.py (1)

41-73: ⚠️ Potential issue | 🟡 Minor

Return list[str] on every rank.

On Line 52, worker ranks leave solution_paths as None whenever cfg.skip_realize_model is true, so this function violates its new -> list[str] contract on that path. Broadcasting the paths outside the realize-only branch, or explicitly returning [] on workers, keeps the API consistent.

💡 Minimal fix
     if dist.is_master():
         solution_paths = launch_mip(cfg)
         length_tensor = torch.tensor([len(solution_paths)], dtype=torch.long, device=device)
     else:
-        solution_paths = None
+        solution_paths = []
         length_tensor = torch.tensor([0], dtype=torch.long, device=device)

-    if not cfg.skip_realize_model:
-        if dist.size() > 1:
-            torch.distributed.broadcast(length_tensor, src=0)
-
-        list_length = length_tensor.item()
-
-        if not dist.is_master():
-            solution_paths = [None] * list_length
-
-        if dist.size() > 1:
-            torch.distributed.broadcast_object_list(solution_paths, src=0)
+    if dist.size() > 1:
+        torch.distributed.broadcast(length_tensor, src=0)
+        if not dist.is_master():
+            solution_paths = [None] * length_tensor.item()
+        torch.distributed.broadcast_object_list(solution_paths, src=0)

+    if not cfg.skip_realize_model:
         for solution_path in solution_paths:
             mprint(f"Realize model for the solution: {solution_path}")
             cfg.realize_model.solutions_path = Path(solution_path)
             launch_realize_model(cfg)
             dist.barrier()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/mip_and_realize_models.py` around lines 41 -
73, The function launch_mip_and_realize_model sometimes returns None on worker
ranks when cfg.skip_realize_model is true; ensure it always returns a list[str].
Fix by moving the broadcast of length_tensor/solution_paths outside the
cfg.skip_realize_model branch (so all ranks receive the master list), or if you
prefer minimal change: after the initial master/non-master branch, set
solution_paths = [] on non-master ranks when cfg.skip_realize_model is true
(instead of leaving it None) so the return type is always list[str]; update any
use of length_tensor/torch.distributed.broadcast_object_list accordingly to
handle the broadcast-even-if-skipped path.
🧹 Nitpick comments (4)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (2)

119-128: Remove unreachable dead code.

The if True: condition makes the else branch (lines 126-128) unreachable. If SDPA is now unconditionally used when llama4_attn_implementation is None, simplify the method by removing the dead code.

♻️ Proposed fix
     def _choose_llama4_attn_implementation(self, llama4_attn_implementation):
         self.llama4_attn_implementation = llama4_attn_implementation
         if self.llama4_attn_implementation is None:
-            # if is_torch_sdpa_available():
-            if True:
-                _print_once("auto-setting llama4_attn_implementation to sdpa")
-                self.llama4_attn_implementation = "sdpa"
-            else:
-                _print_once("auto-setting llama4_attn_implementation to eager")
-                self.llama4_attn_implementation = "eager"
+            _print_once("auto-setting llama4_attn_implementation to sdpa")
+            self.llama4_attn_implementation = "sdpa"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`
around lines 119 - 128, The method _choose_llama4_attn_implementation contains
an always-true branch making the else unreachable; remove the dead if/else and
simplify so that when self.llama4_attn_implementation is None you
unconditionally call _print_once("auto-setting llama4_attn_implementation to
sdpa") and set self.llama4_attn_implementation = "sdpa". Ensure no leftover
unreachable code remains and keep the initial assignment
self.llama4_attn_implementation = llama4_attn_implementation.

23-23: Remove commented-out import.

If is_torch_sdpa_available is no longer needed, remove the commented-out import rather than leaving it as dead code.

♻️ Proposed fix
-from transformers.utils import is_flash_attn_2_available  # , is_torch_sdpa_available
+from transformers.utils import is_flash_attn_2_available
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` at
line 23, Remove the dead commented-out import by deleting the trailing ",
is_torch_sdpa_available" from the import line that currently reads "from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available" so only is_flash_attn_2_available is imported; ensure
no leftover comment remains and run the project formatter/linter to keep the
import clean in configuration_decilm.py.
.pre-commit-config.yaml (1)

116-116: Remove the duplicated regex entry for lm_eval_anymodel.py.

Line 116 repeats an existing exclusion already present at Line 111. Keeping only one avoids regex drift and confusion.

Suggested cleanup
-              examples/puzzletron/evaluation/lm_eval_anymodel.py|
               modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py|
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml at line 116, Remove the duplicate exclusion entry
for "examples/puzzletron/evaluation/lm_eval_anymodel.py" from
.pre-commit-config.yaml's list so the filename appears only once; locate the
repeated regex string "examples/puzzletron/evaluation/lm_eval_anymodel.py" in
the exclusions and delete the redundant occurrence, leaving a single entry to
avoid duplicate patterns.
tests/gpu/torch/conftest.py (1)

57-59: Verify that session-wide autouse=True is intentional.

This fixture applies mto.enable_huggingface_checkpointing() to all tests in the entire session, which globally patches HuggingFace PreTrainedModel classes. This is a one-way operation (no teardown to restore original methods).

If some tests under tests/gpu/torch/ should run without HF checkpointing enabled, consider:

  1. Removing autouse=True and explicitly requesting the fixture where needed, or
  2. Adding a comment confirming this global patching is intentional for all GPU tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/conftest.py` around lines 57 - 59, The fixture
enable_hf_checkpointing currently has autouse=True and calls
mto.enable_huggingface_checkpointing(), which globally patches HF
PreTrainedModel without a teardown; either remove autouse=True so tests opt-in
to enable_huggingface_checkpointing() or explicitly document this global
behavior with a clear comment above the enable_hf_checkpointing fixture stating
that a session-wide, permanent HF patch is intentional for all GPU/torch tests;
update the fixture signature or add the comment accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/mip/sweep.py`:
- Around line 227-295: run_mip_sweep currently assumes realization ran and then
calls extract_solution_results which will fail when hydra_cfg.skip_realize_model
is true; add an explicit guard at the start of run_mip_sweep (or right before
calling mip_and_realize_models.launch_mip_and_realize_model) to reject/raise a
clear error if hydra_cfg.skip_realize_model is truthy, mentioning that sweep
mode requires realization, or alternatively skip extraction and CSV writing when
skip_realize_model is set; reference run_mip_sweep,
hydra_cfg.skip_realize_model,
mip_and_realize_models.launch_mip_and_realize_model, and
extract_solution_results to locate the change.

In `@modelopt/torch/puzzletron/utils/parsing.py`:
- Around line 153-155: The branch that formats KV-heads currently returns a
string for any non-None value of attention_config.num_key_value_heads
(num_kv_heads) and can emit "0 kv heads"; update the guard to require a positive
integer (e.g., num_kv_heads > 0) before returning f"{num_kv_heads} kv heads" so
that zero or negative sentinels are ignored and execution falls through to the
existing fallback formatting logic.

---

Outside diff comments:
In `@modelopt/torch/puzzletron/mip/mip_and_realize_models.py`:
- Around line 41-73: The function launch_mip_and_realize_model sometimes returns
None on worker ranks when cfg.skip_realize_model is true; ensure it always
returns a list[str]. Fix by moving the broadcast of length_tensor/solution_paths
outside the cfg.skip_realize_model branch (so all ranks receive the master
list), or if you prefer minimal change: after the initial master/non-master
branch, set solution_paths = [] on non-master ranks when cfg.skip_realize_model
is true (instead of leaving it None) so the return type is always list[str];
update any use of length_tensor/torch.distributed.broadcast_object_list
accordingly to handle the broadcast-even-if-skipped path.

In `@modelopt/torch/puzzletron/utils/checkpoint_manager.py`:
- Around line 130-132: Update the torch.load call that reads hook_states_path in
checkpoint_manager.py to explicitly pass weights_only=True for compliance;
locate the torch.load(hook_states_path, map_location="cpu") invocation and
change it to include weights_only=True so it becomes
torch.load(hook_states_path, map_location="cpu", weights_only=True).

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Line 116: Remove the duplicate exclusion entry for
"examples/puzzletron/evaluation/lm_eval_anymodel.py" from
.pre-commit-config.yaml's list so the filename appears only once; locate the
repeated regex string "examples/puzzletron/evaluation/lm_eval_anymodel.py" in
the exclusions and delete the redundant occurrence, leaving a single entry to
avoid duplicate patterns.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`:
- Around line 119-128: The method _choose_llama4_attn_implementation contains an
always-true branch making the else unreachable; remove the dead if/else and
simplify so that when self.llama4_attn_implementation is None you
unconditionally call _print_once("auto-setting llama4_attn_implementation to
sdpa") and set self.llama4_attn_implementation = "sdpa". Ensure no leftover
unreachable code remains and keep the initial assignment
self.llama4_attn_implementation = llama4_attn_implementation.
- Line 23: Remove the dead commented-out import by deleting the trailing ",
is_torch_sdpa_available" from the import line that currently reads "from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available" so only is_flash_attn_2_available is imported; ensure
no leftover comment remains and run the project formatter/linter to keep the
import clean in configuration_decilm.py.

In `@tests/gpu/torch/conftest.py`:
- Around line 57-59: The fixture enable_hf_checkpointing currently has
autouse=True and calls mto.enable_huggingface_checkpointing(), which globally
patches HF PreTrainedModel without a teardown; either remove autouse=True so
tests opt-in to enable_huggingface_checkpointing() or explicitly document this
global behavior with a clear comment above the enable_hf_checkpointing fixture
stating that a session-wide, permanent HF patch is intentional for all GPU/torch
tests; update the fixture signature or add the comment accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1baf11e6-f95a-4330-8768-72867ddb6e94

📥 Commits

Reviewing files that changed from the base of the PR and between 01cba6a and a3e20fc.

📒 Files selected for processing (9)
  • .pre-commit-config.yaml
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
  • modelopt/torch/puzzletron/mip/mip_and_realize_models.py
  • modelopt/torch/puzzletron/mip/sweep.py
  • modelopt/torch/puzzletron/utils/checkpoint_manager.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/_test_utils/torch/puzzletron/utils.py
  • tests/gpu/torch/conftest.py

Comment on lines +227 to +295
def run_mip_sweep(hydra_cfg):
"""Run MIP for multiple memory compression rates and generate CSV with results.

This function is called when mip.sweep.enabled is True in the config.

Args:
hydra_cfg: Hydra configuration object with mip.sweep settings
"""
mprint("=" * 80)
mprint("MIP Sweep Mode Enabled")
mprint("=" * 80)

# Get sweep configuration
sweep_cfg = hydra_cfg.mip.sweep
compression_rates = sweep_cfg.memory_compression_rates
output_csv = sweep_cfg.output_csv
puzzle_dir = Path(hydra_cfg.puzzle_dir)

mprint(f"Compression rates: {compression_rates}")
mprint(f"Output CSV: {output_csv}")
mprint(f"Puzzle directory: {puzzle_dir}")

# Calculate teacher memory from subblock_stats
teacher_memory = get_teacher_memory_from_subblock_stats(hydra_cfg)
mprint(
f"Teacher memory (from subblock_stats): {teacher_memory:.1f} MiB ({teacher_memory / 1024:.1f} GiB)"
)

# Collect results
all_results = []

# Run MIP for each compression rate
for compression_rate in compression_rates:
target_memory_mib = teacher_memory * compression_rate
mprint("\n" + "=" * 80)
mprint(
f"Running MIP for compression_rate={compression_rate:.2f} "
f"(target={target_memory_mib:.1f} MiB = {target_memory_mib / 1024:.1f} GiB)"
)
mprint("=" * 80)

# Modify config dynamically
hydra_cfg.mip.human_constraints.target_memory = target_memory_mib

# Run MIP and realize models (reuse existing distributed logic!)
solution_paths = mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)

# Extract results (only on master rank)
if dist.is_master():
for solution_path in solution_paths:
result = extract_solution_results(
solution_path=Path(solution_path),
target_memory_mib=target_memory_mib,
teacher_memory_mib=teacher_memory,
compression_rate=compression_rate,
)
all_results.append(result)

mprint(
f"✓ Results: actual_memory={result['actual_memory_mib']:.1f} MiB, "
f"lm_loss={result['lm_loss']:.4f}"
)

# Write results to CSV (only on master rank)
if dist.is_master():
mprint("\n" + "=" * 80)
mprint("MIP Sweep Complete - Writing Results")
mprint("=" * 80)
write_results_to_csv(all_results, output_csv)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject sweep mode when realization is disabled.

This function always extracts validation metrics after launch_mip_and_realize_model(), but that flow skips generating solutions--validation/solution_0.json when hydra_cfg.skip_realize_model is true. Right now the sweep will fail later with a FileNotFoundError instead of a clear config error.

💡 Minimal guard
 def run_mip_sweep(hydra_cfg):
     """Run MIP for multiple memory compression rates and generate CSV with results.
@@
     Args:
         hydra_cfg: Hydra configuration object with mip.sweep settings
     """
+    if hydra_cfg.skip_realize_model:
+        raise ValueError(
+            "mip.sweep requires realization because it reads solutions--validation/solution_0.json."
+        )
+
     mprint("=" * 80)
     mprint("MIP Sweep Mode Enabled")
     mprint("=" * 80)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 227 - 295, run_mip_sweep
currently assumes realization ran and then calls extract_solution_results which
will fail when hydra_cfg.skip_realize_model is true; add an explicit guard at
the start of run_mip_sweep (or right before calling
mip_and_realize_models.launch_mip_and_realize_model) to reject/raise a clear
error if hydra_cfg.skip_realize_model is truthy, mentioning that sweep mode
requires realization, or alternatively skip extraction and CSV writing when
skip_realize_model is set; reference run_mip_sweep,
hydra_cfg.skip_realize_model,
mip_and_realize_models.launch_mip_and_realize_model, and
extract_solution_results to locate the change.

Comment on lines +153 to +155
num_kv_heads = attention_config.num_key_value_heads
if num_kv_heads is not None:
return f"{num_kv_heads} kv heads"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard KV-head display against non-positive values.

This branch currently treats 0 (or any non-None sentinel) as valid and can render misleading output like 0 kv heads. Please require a positive value before formatting.

Suggested patch
-    num_kv_heads = attention_config.num_key_value_heads
-    if num_kv_heads is not None:
+    num_kv_heads = attention_config.num_key_value_heads
+    if num_kv_heads is not None and num_kv_heads > 0:
         return f"{num_kv_heads} kv heads"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
num_kv_heads = attention_config.num_key_value_heads
if num_kv_heads is not None:
return f"{num_kv_heads} kv heads"
num_kv_heads = attention_config.num_key_value_heads
if num_kv_heads is not None and num_kv_heads > 0:
return f"{num_kv_heads} kv heads"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/utils/parsing.py` around lines 153 - 155, The
branch that formats KV-heads currently returns a string for any non-None value
of attention_config.num_key_value_heads (num_kv_heads) and can emit "0 kv
heads"; update the guard to require a positive integer (e.g., num_kv_heads > 0)
before returning f"{num_kv_heads} kv heads" so that zero or negative sentinels
are ignored and execution falls through to the existing fallback formatting
logic.

Comment on lines 122 to 128
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just keep the if part and remove else part?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

config.hidden_size = 256
config.intermediate_size = 512
config.num_hidden_layers = 2
config.num_hidden_layers = max(2, dist.size())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If num layers is either 1 or 2, does our assertions work for both case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file can be removed as this content is already moved to tests/gpu/conftest.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (01cba6a) to head (808ad3f).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           feature/puzzletron    #1047   +/-   ##
===================================================
  Coverage               72.12%   72.12%           
===================================================
  Files                     209      209           
  Lines                   23628    23628           
===================================================
  Hits                    17042    17042           
  Misses                   6586     6586           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (1)

23-23: Clean up the commented import and document the unconditional SDPA default.

The dependency floor (torch>=2.6) guarantees SDPA availability, so the unconditional "sdpa" default in _choose_llama4_attn_implementation() is safe. However, remove the commented import fragment on line 23 and add an inline comment on lines 121–123 explaining that SDPA is the only supported auto mode given the minimum torch requirement. This prevents future confusion when revisiting this code.

Suggested cleanup
-from transformers.utils import is_flash_attn_2_available  # , is_torch_sdpa_available
+from transformers.utils import is_flash_attn_2_available
@@
         self.llama4_attn_implementation = llama4_attn_implementation
         if self.llama4_attn_implementation is None:
+            # SDPA is guaranteed available with torch>=2.6 (minimum requirement).
             _print_once("auto-setting llama4_attn_implementation to sdpa")
             self.llama4_attn_implementation = "sdpa"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` at
line 23, Remove the leftover commented import fragment ",
is_torch_sdpa_available" from the top-level import line (currently: from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available) and add a short inline comment inside the
_choose_llama4_attn_implementation() function explaining that the unconditional
"sdpa" default is safe because the package enforces torch>=2.6 (so SDPA support
is guaranteed), to prevent future confusion about supported auto modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`:
- Line 23: Remove the leftover commented import fragment ",
is_torch_sdpa_available" from the top-level import line (currently: from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available) and add a short inline comment inside the
_choose_llama4_attn_implementation() function explaining that the unconditional
"sdpa" default is safe because the package enforces torch>=2.6 (so SDPA support
is guaranteed), to prevent future confusion about supported auto modes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dfe06589-a2f2-4264-b26a-3141fae72bb2

📥 Commits

Reviewing files that changed from the base of the PR and between ba85c29 and 36cb150.

📒 Files selected for processing (1)
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@danielkorzekwa danielkorzekwa merged commit 2b6572c into feature/puzzletron Mar 20, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process branch March 20, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants