Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-20T18:10:44Z

What does this PR do?

Integration tests for subblock stats (memory + num_of_params)

Summary by CodeRabbit

Refactor
- Improved teacher model configuration loading and statistics computation for enhanced accuracy in memory and parameter calculations.
Tests
- Added comprehensive tests validating teacher model memory and parameter statistics calculations with strict accuracy tolerances.

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai · 2026-03-20T18:10:53Z

📝 Walkthrough

Walkthrough

The PR refactors teacher configuration loading and memory/parameter computation by introducing a descriptor-driven approach through _load_teacher_subblock_stats(), which replaces manual parsing. New functions compute total teacher memory and parameters by summing block-level statistics. Tests are updated to validate these computations against per-model expected values.

Changes

Cohort / File(s)	Summary
Teacher Config & Memory Computation `modelopt/torch/puzzletron/mip/sweep.py`	Introduced `_load_teacher_subblock_stats()` helper to load teacher model config via `ModelDescriptorFactory` and filter subblock stats using `filter_subblock_stats_by_args()`. Updated `get_teacher_memory_from_subblock_stats()` to sum `non_block` and per-layer block memory. Added `get_teacher_num_params_from_subblock_stats()` for parameter summation. Added imports for `DictConfig`, `OmegaConf`, `PretrainedConfig`, `ModelDescriptorFactory`, and `load_model_config`.
Test Validation `tests/gpu/torch/puzzletron/test_puzzletron.py`	Updated `puzzletron()` call to capture returned `hydra_cfg`. Replaced generic subblock_stats existence check with `_assert_subblock_stats_anymodel()` helper that validates computed teacher memory and parameters against per-model expected constants with `1e-6` tolerance. Added `EXPECTED_TEACHER_MEMORY_MIB` and `EXPECTED_TEACHER_NUM_PARAMS` dictionaries alongside existing score/loss expectations.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant sweep.py
    participant ModelDescriptorFactory
    participant load_model_config
    participant OmegaConf as OmegaConf
    participant filter_fn as filter_subblock_stats_by_args
    participant json as subblock_stats.json

    Caller->>sweep.py: _load_teacher_subblock_stats(hydra_cfg)
    sweep.py->>ModelDescriptorFactory: get_model_descriptor(teacher_name)
    ModelDescriptorFactory-->>sweep.py: descriptor
    sweep.py->>load_model_config: load_model_config(descriptor)
    load_model_config-->>sweep.py: model_config (hidden_size)
    sweep.py->>OmegaConf: to_container(subblock_stats_args[0])
    OmegaConf-->>sweep.py: normalized_args
    sweep.py->>sweep.py: inject n_embd=hidden_size into normalized_args
    sweep.py->>json: load subblock_stats_list
    json-->>sweep.py: subblock_stats_list
    sweep.py->>filter_fn: filter_subblock_stats_by_args(list, normalized_args)
    filter_fn-->>sweep.py: filtered_subblock_stats (unique match)
    sweep.py-->>Caller: (subblock_stats, model_config)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Dkorzekwa/anymodel subblock stats' is vague and includes a branch/user prefix that doesn't clearly convey the main change; it reads as a branch name rather than a meaningful summary of the changeset.	Rephrase to clearly describe the main change, e.g., 'Add subblock stats integration tests and helper functions for teacher memory/param validation' or 'Refactor teacher config loading with descriptor-driven subblock stats.'

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	The pull request complies with security anti-patterns in SECURITY.md. torch.load() in test code is exempted; trust_remote_code uses descriptor-driven approach per best practices.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dkorzekwa/anymodel_subblock_stats

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/mip/sweep.py`:
- Around line 94-95: The code currently defaults subblock_stats.get("non_block",
{}).get("memory_mib", 0.0) (and similar for num_params) to zero which masks
malformed subblock_stats; change the logic in sweep.py to fail fast: check that
subblock_stats contains the "non_block" key and that "memory_mib" and
"num_params" exist (raise a clear ValueError or RuntimeError including subblock
identifier and the offending dict when missing) before computing
total_memory/teacher_memory, and apply the same existence checks for per-layer
accesses of block_stats[...] so malformed JSON surfaces immediately rather than
silently using zeros.
- Around line 47-48: The code assumes hydra_cfg.mip.subblock_stats_args is a
singleton but always uses index [0] (used by the teacher-total helpers and
run_mip_sweep), so add a fail-fast guard that asserts this is length 1 before
converting: check len(hydra_cfg.mip.subblock_stats_args) == 1 and raise a clear
ValueError (or AssertionError) if not, so the code in sweep.py (where
mip_subblock_args and subblock_stats_args are derived) explicitly enforces the
singleton assumption and prevents silently using the wrong scenario.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 176ee5b2-5938-4ce8-8d70-e6b89168b1a4

📥 Commits

Reviewing files that changed from the base of the PR and between 4190275 and d55f8d9.

📒 Files selected for processing (2)

modelopt/torch/puzzletron/mip/sweep.py
tests/gpu/torch/puzzletron/test_puzzletron.py

coderabbitai · 2026-03-23T17:42:34Z

modelopt/torch/puzzletron/mip/sweep.py

    mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]
-    batch_size = mip_subblock_args["batch_size"]
-    weights_dtype = str(mip_subblock_args["weights_dtype"])
-    activations_dtype = str(mip_subblock_args["activations_dtype"])
-    kv_cache_dtype = str(mip_subblock_args["kv_cache_dtype"])
+    subblock_stats_args = OmegaConf.to_container(mip_subblock_args, resolve=True)


🛠️ Refactor suggestion | 🟠 Major

Assert the singleton subblock_stats_args assumption here.

Both teacher-total helpers always read hydra_cfg.mip.subblock_stats_args[0]. If a config ever carries more than one measurement scenario, these totals come from the wrong row and run_mip_sweep() will derive the wrong teacher baseline. Either thread the intended scenario through explicitly or fail fast unless this list is a singleton.

♻️ Suggested guard

- mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0] + if len(hydra_cfg.mip.subblock_stats_args) != 1: + raise ValueError( + "Expected exactly one mip.subblock_stats_args entry; pass the desired scenario explicitly." + ) + mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]

batch_size = mip_subblock_args["batch_size"]

weights_dtype = str(mip_subblock_args["weights_dtype"])

activations_dtype = str(mip_subblock_args["activations_dtype"])

kv_cache_dtype = str(mip_subblock_args["kv_cache_dtype"])

subblock_stats_args = OmegaConf.to_container(mip_subblock_args, resolve=True)

if len(hydra_cfg.mip.subblock_stats_args) != 1:

raise ValueError(

"Expected exactly one mip.subblock_stats_args entry; pass the desired scenario explicitly."

)

mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]

subblock_stats_args = OmegaConf.to_container(mip_subblock_args, resolve=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 47 - 48, The code assumes hydra_cfg.mip.subblock_stats_args is a singleton but always uses index [0] (used by the teacher-total helpers and run_mip_sweep), so add a fail-fast guard that asserts this is length 1 before converting: check len(hydra_cfg.mip.subblock_stats_args) == 1 and raise a clear ValueError (or AssertionError) if not, so the code in sweep.py (where mip_subblock_args and subblock_stats_args are derived) explicitly enforces the singleton assumption and prevents silently using the wrong scenario.

coderabbitai · 2026-03-23T17:42:34Z

modelopt/torch/puzzletron/mip/sweep.py

+    total_memory = subblock_stats.get("non_block", {}).get("memory_mib", 0.0)



⚠️ Potential issue | 🟠 Major

Fail fast on missing non_block metrics instead of treating them as zero.

teacher_memory drives the sweep target, so defaulting non_block.memory_mib / non_block.num_params to 0 turns malformed subblock_stats.json into a smaller baseline instead of surfacing the schema error. These keys look required here, just like the per-layer block_stats[...] accesses below.

♻️ Suggested guard

- total_memory = subblock_stats.get("non_block", {}).get("memory_mib", 0.0) + try: + total_memory = subblock_stats["non_block"]["memory_mib"] + except KeyError as e: + raise ValueError("Missing non_block.memory_mib in subblock_stats.json") from e

- total_params = subblock_stats.get("non_block", {}).get("num_params", 0) + try: + total_params = subblock_stats["non_block"]["num_params"] + except KeyError as e: + raise ValueError("Missing non_block.num_params in subblock_stats.json") from e

Also applies to: 117-118

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 94 - 95, The code currently defaults subblock_stats.get("non_block", {}).get("memory_mib", 0.0) (and similar for num_params) to zero which masks malformed subblock_stats; change the logic in sweep.py to fail fast: check that subblock_stats contains the "non_block" key and that "memory_mib" and "num_params" exist (raise a clear ValueError or RuntimeError including subblock identifier and the offending dict when missing) before computing total_memory/teacher_memory, and apply the same existence checks for per-layer accesses of block_stats[...] so malformed JSON surfaces immediately rather than silently using zeros.

codecov · 2026-03-23T17:45:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (4190275) to head (d55f8d9).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                 Coverage Diff                 @@
##           feature/puzzletron    #1085   +/-   ##
===================================================
  Coverage               72.12%   72.12%           
===================================================
  Files                     209      209           
  Lines                   23628    23628           
===================================================
  Hits                    17042    17042           
  Misses                   6586     6586

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

danielkorzekwa added 30 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 17 commits March 19, 2026 06:11

Delete DeciLMForCausalLM

e0fb3c1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove unused save_checkpoint_as_symlinks()

dbaab53

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code clean up

9c943fd

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove megatron_tokenizer

098d7c1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete copy_deci_lm_hf_code

5d0efa1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete DeciLMPreTrainModel and DeciLMModel

ead68bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used code from replacement_library.py

2d91afc

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used decilm code

492cbaf

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used decilm code

1834c76

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove dead replacement_library code

f096d11

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used transformers code

dc52a81

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete unused decilm code

b9178a3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Import clean up.

9c496bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Support moe in sweep.py

467247a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Code clean up.

034e77d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add assertions for memory subblock stats

4bbdeaf

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code clean up

837e14f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa requested a review from a team as a code owner March 20, 2026 18:10

kevalmorabia97 approved these changes Mar 20, 2026

View reviewed changes

Base automatically changed from dkorzekwa/decilm_hf_code_cleanup_2 to feature/puzzletron March 23, 2026 17:24

danielkorzekwa requested review from a team as code owners March 23, 2026 17:24

danielkorzekwa requested a review from ChenhanYu March 23, 2026 17:24

Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_subblock_stats

d55f8d9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

danielkorzekwa merged commit 0708ca2 into feature/puzzletron Mar 24, 2026
28 checks passed

danielkorzekwa deleted the dkorzekwa/anymodel_subblock_stats branch March 24, 2026 07:41

danielkorzekwa mentioned this pull request Mar 25, 2026

Merge puzzletron compression algorithm #1121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dkorzekwa/anymodel subblock stats#1085

Dkorzekwa/anymodel subblock stats#1085
danielkorzekwa merged 120 commits intofeature/puzzletronfrom
dkorzekwa/anymodel_subblock_stats

danielkorzekwa commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

codecov bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		total_memory = subblock_stats.get("non_block", {}).get("memory_mib", 0.0)

Conversation

danielkorzekwa commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielkorzekwa commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading