MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-16T14:22:06Z

What does this PR do?

The remaining difference between dkorzekwa/any_model and dkorzekwa/anymodel_mbridgedist

Summary by CodeRabbit

New Features
- Added an end-to-end MIP sweep workflow that runs sweeps, collects metrics, and writes CSV reports.
Improvements
- Simplified attention implementation selection and clarified attention display wording.
- Realization pipeline now returns realized model solution paths.
- Caching behavior adjusted for token ID retrieval.
- Test model generation scales layer count with distributed size.
Bug Fixes
- Checkpoint hook state restored using the correct load API.
Chores
- Extended pre-commit license hook exclusion.
Tests
- Added GPU/distributed test fixtures and dtype/checkout helpers.

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…dapted from https://github.com/EleutherAI/lm-evaluation-harness Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…el_gptoss

…rial

…ridgedist

…gs_from_dkorzekwa_anymodel_merging_process

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…rial

…ridgedist

…gs_from_dkorzekwa_anymodel_merging_process Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…ridgedist

…gs_from_dkorzekwa_anymodel_merging_process

…zekwa_anymodel_merging_process Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelopt/torch/puzzletron/utils/checkpoint_manager.py (1)

130-132: ⚠️ Potential issue | 🟡 Minor

Explicitly specify weights_only=True in torch.load() for security guideline compliance.

The project enforces torch>=2.6 which defaults to weights_only=True, making the code technically safe. However, the security coding guidelines require always explicitly setting this parameter. Change line 132 to:
hook_states = torch.load(hook_states_path, map_location="cpu", weights_only=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/utils/checkpoint_manager.py` around lines 130 -
132, Update the torch.load call that reads hook_states_path in
checkpoint_manager.py to explicitly pass weights_only=True for compliance;
locate the torch.load(hook_states_path, map_location="cpu") invocation and
change it to include weights_only=True so it becomes
torch.load(hook_states_path, map_location="cpu", weights_only=True).

modelopt/torch/puzzletron/mip/mip_and_realize_models.py (1)

41-73: ⚠️ Potential issue | 🟡 Minor

Return list[str] on every rank.

On Line 52, worker ranks leave solution_paths as None whenever cfg.skip_realize_model is true, so this function violates its new -> list[str] contract on that path. Broadcasting the paths outside the realize-only branch, or explicitly returning [] on workers, keeps the API consistent.

💡 Minimal fix

     if dist.is_master():
         solution_paths = launch_mip(cfg)
         length_tensor = torch.tensor([len(solution_paths)], dtype=torch.long, device=device)
     else:
-        solution_paths = None
+        solution_paths = []
         length_tensor = torch.tensor([0], dtype=torch.long, device=device)

-    if not cfg.skip_realize_model:
-        if dist.size() > 1:
-            torch.distributed.broadcast(length_tensor, src=0)
-
-        list_length = length_tensor.item()
-
-        if not dist.is_master():
-            solution_paths = [None] * list_length
-
-        if dist.size() > 1:
-            torch.distributed.broadcast_object_list(solution_paths, src=0)
+    if dist.size() > 1:
+        torch.distributed.broadcast(length_tensor, src=0)
+        if not dist.is_master():
+            solution_paths = [None] * length_tensor.item()
+        torch.distributed.broadcast_object_list(solution_paths, src=0)

+    if not cfg.skip_realize_model:
         for solution_path in solution_paths:
             mprint(f"Realize model for the solution: {solution_path}")
             cfg.realize_model.solutions_path = Path(solution_path)
             launch_realize_model(cfg)
             dist.barrier()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/mip_and_realize_models.py` around lines 41 -
73, The function launch_mip_and_realize_model sometimes returns None on worker
ranks when cfg.skip_realize_model is true; ensure it always returns a list[str].
Fix by moving the broadcast of length_tensor/solution_paths outside the
cfg.skip_realize_model branch (so all ranks receive the master list), or if you
prefer minimal change: after the initial master/non-master branch, set
solution_paths = [] on non-master ranks when cfg.skip_realize_model is true
(instead of leaving it None) so the return type is always list[str]; update any
use of length_tensor/torch.distributed.broadcast_object_list accordingly to
handle the broadcast-even-if-skipped path.

🧹 Nitpick comments (4)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (2)

119-128: Remove unreachable dead code.

The if True: condition makes the else branch (lines 126-128) unreachable. If SDPA is now unconditionally used when llama4_attn_implementation is None, simplify the method by removing the dead code.

♻️ Proposed fix

     def _choose_llama4_attn_implementation(self, llama4_attn_implementation):
         self.llama4_attn_implementation = llama4_attn_implementation
         if self.llama4_attn_implementation is None:
-            # if is_torch_sdpa_available():
-            if True:
-                _print_once("auto-setting llama4_attn_implementation to sdpa")
-                self.llama4_attn_implementation = "sdpa"
-            else:
-                _print_once("auto-setting llama4_attn_implementation to eager")
-                self.llama4_attn_implementation = "eager"
+            _print_once("auto-setting llama4_attn_implementation to sdpa")
+            self.llama4_attn_implementation = "sdpa"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`
around lines 119 - 128, The method _choose_llama4_attn_implementation contains
an always-true branch making the else unreachable; remove the dead if/else and
simplify so that when self.llama4_attn_implementation is None you
unconditionally call _print_once("auto-setting llama4_attn_implementation to
sdpa") and set self.llama4_attn_implementation = "sdpa". Ensure no leftover
unreachable code remains and keep the initial assignment
self.llama4_attn_implementation = llama4_attn_implementation.

23-23: Remove commented-out import.

If is_torch_sdpa_available is no longer needed, remove the commented-out import rather than leaving it as dead code.

♻️ Proposed fix

-from transformers.utils import is_flash_attn_2_available  # , is_torch_sdpa_available
+from transformers.utils import is_flash_attn_2_available

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` at
line 23, Remove the dead commented-out import by deleting the trailing ",
is_torch_sdpa_available" from the import line that currently reads "from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available" so only is_flash_attn_2_available is imported; ensure
no leftover comment remains and run the project formatter/linter to keep the
import clean in configuration_decilm.py.

.pre-commit-config.yaml (1)

116-116: Remove the duplicated regex entry for lm_eval_anymodel.py.

Line 116 repeats an existing exclusion already present at Line 111. Keeping only one avoids regex drift and confusion.

Suggested cleanup

-              examples/puzzletron/evaluation/lm_eval_anymodel.py|
               modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py|

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml at line 116, Remove the duplicate exclusion entry
for "examples/puzzletron/evaluation/lm_eval_anymodel.py" from
.pre-commit-config.yaml's list so the filename appears only once; locate the
repeated regex string "examples/puzzletron/evaluation/lm_eval_anymodel.py" in
the exclusions and delete the redundant occurrence, leaving a single entry to
avoid duplicate patterns.

tests/gpu/torch/conftest.py (1)

57-59: Verify that session-wide autouse=True is intentional.

This fixture applies mto.enable_huggingface_checkpointing() to all tests in the entire session, which globally patches HuggingFace PreTrainedModel classes. This is a one-way operation (no teardown to restore original methods).

If some tests under tests/gpu/torch/ should run without HF checkpointing enabled, consider:

Removing autouse=True and explicitly requesting the fixture where needed, or

Adding a comment confirming this global patching is intentional for all GPU tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/conftest.py` around lines 57 - 59, The fixture
enable_hf_checkpointing currently has autouse=True and calls
mto.enable_huggingface_checkpointing(), which globally patches HF
PreTrainedModel without a teardown; either remove autouse=True so tests opt-in
to enable_huggingface_checkpointing() or explicitly document this global
behavior with a clear comment above the enable_hf_checkpointing fixture stating
that a session-wide, permanent HF patch is intentional for all GPU/torch tests;
update the fixture signature or add the comment accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/mip/sweep.py`:
- Around line 227-295: run_mip_sweep currently assumes realization ran and then
calls extract_solution_results which will fail when hydra_cfg.skip_realize_model
is true; add an explicit guard at the start of run_mip_sweep (or right before
calling mip_and_realize_models.launch_mip_and_realize_model) to reject/raise a
clear error if hydra_cfg.skip_realize_model is truthy, mentioning that sweep
mode requires realization, or alternatively skip extraction and CSV writing when
skip_realize_model is set; reference run_mip_sweep,
hydra_cfg.skip_realize_model,
mip_and_realize_models.launch_mip_and_realize_model, and
extract_solution_results to locate the change.

In `@modelopt/torch/puzzletron/utils/parsing.py`:
- Around line 153-155: The branch that formats KV-heads currently returns a
string for any non-None value of attention_config.num_key_value_heads
(num_kv_heads) and can emit "0 kv heads"; update the guard to require a positive
integer (e.g., num_kv_heads > 0) before returning f"{num_kv_heads} kv heads" so
that zero or negative sentinels are ignored and execution falls through to the
existing fallback formatting logic.

---

Outside diff comments:
In `@modelopt/torch/puzzletron/mip/mip_and_realize_models.py`:
- Around line 41-73: The function launch_mip_and_realize_model sometimes returns
None on worker ranks when cfg.skip_realize_model is true; ensure it always
returns a list[str]. Fix by moving the broadcast of length_tensor/solution_paths
outside the cfg.skip_realize_model branch (so all ranks receive the master
list), or if you prefer minimal change: after the initial master/non-master
branch, set solution_paths = [] on non-master ranks when cfg.skip_realize_model
is true (instead of leaving it None) so the return type is always list[str];
update any use of length_tensor/torch.distributed.broadcast_object_list
accordingly to handle the broadcast-even-if-skipped path.

In `@modelopt/torch/puzzletron/utils/checkpoint_manager.py`:
- Around line 130-132: Update the torch.load call that reads hook_states_path in
checkpoint_manager.py to explicitly pass weights_only=True for compliance;
locate the torch.load(hook_states_path, map_location="cpu") invocation and
change it to include weights_only=True so it becomes
torch.load(hook_states_path, map_location="cpu", weights_only=True).

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Line 116: Remove the duplicate exclusion entry for
"examples/puzzletron/evaluation/lm_eval_anymodel.py" from
.pre-commit-config.yaml's list so the filename appears only once; locate the
repeated regex string "examples/puzzletron/evaluation/lm_eval_anymodel.py" in
the exclusions and delete the redundant occurrence, leaving a single entry to
avoid duplicate patterns.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`:
- Around line 119-128: The method _choose_llama4_attn_implementation contains an
always-true branch making the else unreachable; remove the dead if/else and
simplify so that when self.llama4_attn_implementation is None you
unconditionally call _print_once("auto-setting llama4_attn_implementation to
sdpa") and set self.llama4_attn_implementation = "sdpa". Ensure no leftover
unreachable code remains and keep the initial assignment
self.llama4_attn_implementation = llama4_attn_implementation.
- Line 23: Remove the dead commented-out import by deleting the trailing ",
is_torch_sdpa_available" from the import line that currently reads "from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available" so only is_flash_attn_2_available is imported; ensure
no leftover comment remains and run the project formatter/linter to keep the
import clean in configuration_decilm.py.

In `@tests/gpu/torch/conftest.py`:
- Around line 57-59: The fixture enable_hf_checkpointing currently has
autouse=True and calls mto.enable_huggingface_checkpointing(), which globally
patches HF PreTrainedModel without a teardown; either remove autouse=True so
tests opt-in to enable_huggingface_checkpointing() or explicitly document this
global behavior with a clear comment above the enable_hf_checkpointing fixture
stating that a session-wide, permanent HF patch is intentional for all GPU/torch
tests; update the fixture signature or add the comment accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1baf11e6-f95a-4330-8768-72867ddb6e94

📥 Commits

Reviewing files that changed from the base of the PR and between 01cba6a and a3e20fc.

📒 Files selected for processing (9)

.pre-commit-config.yaml
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
modelopt/torch/puzzletron/mip/mip_and_realize_models.py
modelopt/torch/puzzletron/mip/sweep.py
modelopt/torch/puzzletron/utils/checkpoint_manager.py
modelopt/torch/puzzletron/utils/data/dataset.py
modelopt/torch/puzzletron/utils/parsing.py
tests/_test_utils/torch/puzzletron/utils.py
tests/gpu/torch/conftest.py

coderabbitai · 2026-03-20T12:46:10Z

modelopt/torch/puzzletron/mip/sweep.py

+def run_mip_sweep(hydra_cfg):
+    """Run MIP for multiple memory compression rates and generate CSV with results.
+
+    This function is called when mip.sweep.enabled is True in the config.
+
+    Args:
+        hydra_cfg: Hydra configuration object with mip.sweep settings
+    """
+    mprint("=" * 80)
+    mprint("MIP Sweep Mode Enabled")
+    mprint("=" * 80)
+
+    # Get sweep configuration
+    sweep_cfg = hydra_cfg.mip.sweep
+    compression_rates = sweep_cfg.memory_compression_rates
+    output_csv = sweep_cfg.output_csv
+    puzzle_dir = Path(hydra_cfg.puzzle_dir)
+
+    mprint(f"Compression rates: {compression_rates}")
+    mprint(f"Output CSV: {output_csv}")
+    mprint(f"Puzzle directory: {puzzle_dir}")
+
+    # Calculate teacher memory from subblock_stats
+    teacher_memory = get_teacher_memory_from_subblock_stats(hydra_cfg)
+    mprint(
+        f"Teacher memory (from subblock_stats): {teacher_memory:.1f} MiB ({teacher_memory / 1024:.1f} GiB)"
+    )
+
+    # Collect results
+    all_results = []
+
+    # Run MIP for each compression rate
+    for compression_rate in compression_rates:
+        target_memory_mib = teacher_memory * compression_rate
+        mprint("\n" + "=" * 80)
+        mprint(
+            f"Running MIP for compression_rate={compression_rate:.2f} "
+            f"(target={target_memory_mib:.1f} MiB = {target_memory_mib / 1024:.1f} GiB)"
+        )
+        mprint("=" * 80)
+
+        # Modify config dynamically
+        hydra_cfg.mip.human_constraints.target_memory = target_memory_mib
+
+        # Run MIP and realize models (reuse existing distributed logic!)
+        solution_paths = mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)
+
+        # Extract results (only on master rank)
+        if dist.is_master():
+            for solution_path in solution_paths:
+                result = extract_solution_results(
+                    solution_path=Path(solution_path),
+                    target_memory_mib=target_memory_mib,
+                    teacher_memory_mib=teacher_memory,
+                    compression_rate=compression_rate,
+                )
+                all_results.append(result)
+
+                mprint(
+                    f"✓ Results: actual_memory={result['actual_memory_mib']:.1f} MiB, "
+                    f"lm_loss={result['lm_loss']:.4f}"
+                )
+
+    # Write results to CSV (only on master rank)
+    if dist.is_master():
+        mprint("\n" + "=" * 80)
+        mprint("MIP Sweep Complete - Writing Results")
+        mprint("=" * 80)
+        write_results_to_csv(all_results, output_csv)


⚠️ Potential issue | 🟠 Major

Reject sweep mode when realization is disabled.

This function always extracts validation metrics after launch_mip_and_realize_model(), but that flow skips generating solutions--validation/solution_0.json when hydra_cfg.skip_realize_model is true. Right now the sweep will fail later with a FileNotFoundError instead of a clear config error.

💡 Minimal guard

def run_mip_sweep(hydra_cfg): """Run MIP for multiple memory compression rates and generate CSV with results. @@ Args: hydra_cfg: Hydra configuration object with mip.sweep settings """ + if hydra_cfg.skip_realize_model: + raise ValueError( + "mip.sweep requires realization because it reads solutions--validation/solution_0.json." + ) + mprint("=" * 80) mprint("MIP Sweep Mode Enabled") mprint("=" * 80)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 227 - 295, run_mip_sweep currently assumes realization ran and then calls extract_solution_results which will fail when hydra_cfg.skip_realize_model is true; add an explicit guard at the start of run_mip_sweep (or right before calling mip_and_realize_models.launch_mip_and_realize_model) to reject/raise a clear error if hydra_cfg.skip_realize_model is truthy, mentioning that sweep mode requires realization, or alternatively skip extraction and CSV writing when skip_realize_model is set; reference run_mip_sweep, hydra_cfg.skip_realize_model, mip_and_realize_models.launch_mip_and_realize_model, and extract_solution_results to locate the change.

coderabbitai · 2026-03-20T12:46:10Z

modelopt/torch/puzzletron/utils/parsing.py

+    num_kv_heads = attention_config.num_key_value_heads
+    if num_kv_heads is not None:
+        return f"{num_kv_heads} kv heads"


⚠️ Potential issue | 🟡 Minor

Guard KV-head display against non-positive values.

This branch currently treats 0 (or any non-None sentinel) as valid and can render misleading output like 0 kv heads. Please require a positive value before formatting.

Suggested patch

- num_kv_heads = attention_config.num_key_value_heads - if num_kv_heads is not None: + num_kv_heads = attention_config.num_key_value_heads + if num_kv_heads is not None and num_kv_heads > 0: return f"{num_kv_heads} kv heads"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

num_kv_heads = attention_config.num_key_value_heads

if num_kv_heads is not None:

return f"{num_kv_heads} kv heads"

num_kv_heads = attention_config.num_key_value_heads

if num_kv_heads is not None and num_kv_heads > 0:

return f"{num_kv_heads} kv heads"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/utils/parsing.py` around lines 153 - 155, The branch that formats KV-heads currently returns a string for any non-None value of attention_config.num_key_value_heads (num_kv_heads) and can emit "0 kv heads"; update the guard to require a positive integer (e.g., num_kv_heads > 0) before returning f"{num_kv_heads} kv heads" so that zero or negative sentinels are ignored and execution falls through to the existing fallback formatting logic.

kevalmorabia97 · 2026-03-20T12:43:24Z

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py

can we just keep the if part and remove else part?

kevalmorabia97 · 2026-03-20T12:47:19Z

tests/_test_utils/torch/puzzletron/utils.py

        config.hidden_size = 256
        config.intermediate_size = 512
-        config.num_hidden_layers = 2
+        config.num_hidden_layers = max(2, dist.size())


If num layers is either 1 or 2, does our assertions work for both case?

kevalmorabia97 · 2026-03-20T12:47:52Z

tests/gpu/torch/conftest.py

I think this file can be removed as this content is already moved to tests/gpu/conftest.py

.pre-commit-config.yaml

codecov · 2026-03-20T12:54:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (01cba6a) to head (808ad3f).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                 Coverage Diff                 @@
##           feature/puzzletron    #1047   +/-   ##
===================================================
  Coverage               72.12%   72.12%           
===================================================
  Files                     209      209           
  Lines                   23628    23628           
===================================================
  Hits                    17042    17042           
  Misses                   6586     6586

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (1)
23-23: Clean up the commented import and document the unconditional SDPA default.

The dependency floor (torch>=2.6) guarantees SDPA availability, so the unconditional "sdpa" default in _choose_llama4_attn_implementation() is safe. However, remove the commented import fragment on line 23 and add an inline comment on lines 121–123 explaining that SDPA is the only supported auto mode given the minimum torch requirement. This prevents future confusion when revisiting this code.
Suggested cleanup
-from transformers.utils import is_flash_attn_2_available  # , is_torch_sdpa_available
+from transformers.utils import is_flash_attn_2_available
@@
         self.llama4_attn_implementation = llama4_attn_implementation
         if self.llama4_attn_implementation is None:
+            # SDPA is guaranteed available with torch>=2.6 (minimum requirement).
             _print_once("auto-setting llama4_attn_implementation to sdpa")
             self.llama4_attn_implementation = "sdpa"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py` at
line 23, Remove the leftover commented import fragment ",
is_torch_sdpa_available" from the top-level import line (currently: from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available) and add a short inline comment inside the
_choose_llama4_attn_implementation() function explaining that the unconditional
"sdpa" default is safe because the package enforces torch>=2.6 (so SDPA support
is guaranteed), to prevent future confusion about supported auto modes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`:
- Line 23: Remove the leftover commented import fragment ",
is_torch_sdpa_available" from the top-level import line (currently: from
transformers.utils import is_flash_attn_2_available  # ,
is_torch_sdpa_available) and add a short inline comment inside the
_choose_llama4_attn_implementation() function explaining that the unconditional
"sdpa" default is safe because the package enforces torch>=2.6 (so SDPA support
is guaranteed), to prevent future confusion about supported auto modes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dfe06589-a2f2-4264-b26a-3141fae72bb2

📥 Commits

Reviewing files that changed from the base of the PR and between ba85c29 and 36cb150.

📒 Files selected for processing (1)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py

.pre-commit-config.yaml

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

danielkorzekwa added 30 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 15 commits March 17, 2026 01:04

Removing incorrect licence file. gpt_oss_pruned_to_mxfp4.py was not a…

e795f0c

…dapted from https://github.com/EleutherAI/lm-evaluation-harness Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix hardcoded trust_remote_code

631306c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/any_model_other_models' into dkorzekwa/anymod…

dc77be2

…el_gptoss

Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…

b76e0ef

…rial

Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…

109b185

…ridgedist

Merge branch 'dkorzekwa/anymodel_mbridgedist' into dkorzekwa/remainin…

b0972e4

…gs_from_dkorzekwa_anymodel_merging_process

Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_gptoss

5cadc65

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed yaml files for test_puzzletron.

151081c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed mypy exclusion for removed hf_configs files.

36daa6d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…

960b8ce

…rial

Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…

854d96b

…ridgedist

Merge branch 'dkorzekwa/anymodel_mbridgedist' into dkorzekwa/remainin…

cf06997

…gs_from_dkorzekwa_anymodel_merging_process Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_tutorial

b47f846

Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…

13f5edc

…ridgedist

Merge branch 'dkorzekwa/anymodel_mbridgedist' into dkorzekwa/remainin…

b4c71cc

…gs_from_dkorzekwa_anymodel_merging_process

Base automatically changed from dkorzekwa/anymodel_mbridgedist to feature/puzzletron March 20, 2026 11:43

danielkorzekwa requested a review from a team as a code owner March 20, 2026 11:43

danielkorzekwa added 2 commits March 20, 2026 05:35

Merge branch 'feature/puzzletron' into dkorzekwa/remainings_from_dkor…

a3e20fc

…zekwa_anymodel_merging_process Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove not needed licence exception

ba85c29

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

kevalmorabia97 reviewed Mar 20, 2026

View reviewed changes

code clean up

36cb150

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

kevalmorabia97 reviewed Mar 20, 2026

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

Apply suggestion from @kevalmorabia97

808ad3f

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 approved these changes Mar 20, 2026

View reviewed changes

danielkorzekwa merged commit 2b6572c into feature/puzzletron Mar 20, 2026
28 checks passed

danielkorzekwa deleted the dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process branch March 20, 2026 14:23

danielkorzekwa mentioned this pull request Mar 25, 2026

Merge puzzletron compression algorithm #1121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MR branch for the remaining difference between dkorzekwa/any_model an…#1047

MR branch for the remaining difference between dkorzekwa/any_model an…#1047
danielkorzekwa merged 98 commits intofeature/puzzletronfrom
dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process

danielkorzekwa commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 20, 2026

Uh oh!

coderabbitai bot Mar 20, 2026

Uh oh!

kevalmorabia97 Mar 20, 2026

Uh oh!

danielkorzekwa Mar 20, 2026

Uh oh!

kevalmorabia97 Mar 20, 2026

Uh oh!

danielkorzekwa Mar 20, 2026

Uh oh!

kevalmorabia97 Mar 20, 2026

Uh oh!

Uh oh!

codecov bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielkorzekwa commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielkorzekwa Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielkorzekwa Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielkorzekwa commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Mar 20, 2026 •

edited

Loading