Skip to content

Dkorzekwa/anymodel subblock stats#1085

Merged
danielkorzekwa merged 120 commits intofeature/puzzletronfrom
dkorzekwa/anymodel_subblock_stats
Mar 24, 2026
Merged

Dkorzekwa/anymodel subblock stats#1085
danielkorzekwa merged 120 commits intofeature/puzzletronfrom
dkorzekwa/anymodel_subblock_stats

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 20, 2026

What does this PR do?

Integration tests for subblock stats (memory + num_of_params)

Summary by CodeRabbit

  • Refactor

    • Improved teacher model configuration loading and statistics computation for enhanced accuracy in memory and parameter calculations.
  • Tests

    • Added comprehensive tests validating teacher model memory and parameter statistics calculations with strict accuracy tolerances.

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 20, 2026 18:10
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

The PR refactors teacher configuration loading and memory/parameter computation by introducing a descriptor-driven approach through _load_teacher_subblock_stats(), which replaces manual parsing. New functions compute total teacher memory and parameters by summing block-level statistics. Tests are updated to validate these computations against per-model expected values.

Changes

Cohort / File(s) Summary
Teacher Config & Memory Computation
modelopt/torch/puzzletron/mip/sweep.py
Introduced _load_teacher_subblock_stats() helper to load teacher model config via ModelDescriptorFactory and filter subblock stats using filter_subblock_stats_by_args(). Updated get_teacher_memory_from_subblock_stats() to sum non_block and per-layer block memory. Added get_teacher_num_params_from_subblock_stats() for parameter summation. Added imports for DictConfig, OmegaConf, PretrainedConfig, ModelDescriptorFactory, and load_model_config.
Test Validation
tests/gpu/torch/puzzletron/test_puzzletron.py
Updated puzzletron() call to capture returned hydra_cfg. Replaced generic subblock_stats existence check with _assert_subblock_stats_anymodel() helper that validates computed teacher memory and parameters against per-model expected constants with 1e-6 tolerance. Added EXPECTED_TEACHER_MEMORY_MIB and EXPECTED_TEACHER_NUM_PARAMS dictionaries alongside existing score/loss expectations.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant sweep.py
    participant ModelDescriptorFactory
    participant load_model_config
    participant OmegaConf as OmegaConf
    participant filter_fn as filter_subblock_stats_by_args
    participant json as subblock_stats.json

    Caller->>sweep.py: _load_teacher_subblock_stats(hydra_cfg)
    sweep.py->>ModelDescriptorFactory: get_model_descriptor(teacher_name)
    ModelDescriptorFactory-->>sweep.py: descriptor
    sweep.py->>load_model_config: load_model_config(descriptor)
    load_model_config-->>sweep.py: model_config (hidden_size)
    sweep.py->>OmegaConf: to_container(subblock_stats_args[0])
    OmegaConf-->>sweep.py: normalized_args
    sweep.py->>sweep.py: inject n_embd=hidden_size into normalized_args
    sweep.py->>json: load subblock_stats_list
    json-->>sweep.py: subblock_stats_list
    sweep.py->>filter_fn: filter_subblock_stats_by_args(list, normalized_args)
    filter_fn-->>sweep.py: filtered_subblock_stats (unique match)
    sweep.py-->>Caller: (subblock_stats, model_config)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Dkorzekwa/anymodel subblock stats' is vague and includes a branch/user prefix that doesn't clearly convey the main change; it reads as a branch name rather than a meaningful summary of the changeset. Rephrase to clearly describe the main change, e.g., 'Add subblock stats integration tests and helper functions for teacher memory/param validation' or 'Refactor teacher config loading with descriptor-driven subblock stats.'
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request complies with security anti-patterns in SECURITY.md. torch.load() in test code is exempted; trust_remote_code uses descriptor-driven approach per best practices.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dkorzekwa/anymodel_subblock_stats

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from dkorzekwa/decilm_hf_code_cleanup_2 to feature/puzzletron March 23, 2026 17:24
@danielkorzekwa danielkorzekwa requested review from a team as code owners March 23, 2026 17:24
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/mip/sweep.py`:
- Around line 94-95: The code currently defaults subblock_stats.get("non_block",
{}).get("memory_mib", 0.0) (and similar for num_params) to zero which masks
malformed subblock_stats; change the logic in sweep.py to fail fast: check that
subblock_stats contains the "non_block" key and that "memory_mib" and
"num_params" exist (raise a clear ValueError or RuntimeError including subblock
identifier and the offending dict when missing) before computing
total_memory/teacher_memory, and apply the same existence checks for per-layer
accesses of block_stats[...] so malformed JSON surfaces immediately rather than
silently using zeros.
- Around line 47-48: The code assumes hydra_cfg.mip.subblock_stats_args is a
singleton but always uses index [0] (used by the teacher-total helpers and
run_mip_sweep), so add a fail-fast guard that asserts this is length 1 before
converting: check len(hydra_cfg.mip.subblock_stats_args) == 1 and raise a clear
ValueError (or AssertionError) if not, so the code in sweep.py (where
mip_subblock_args and subblock_stats_args are derived) explicitly enforces the
singleton assumption and prevents silently using the wrong scenario.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 176ee5b2-5938-4ce8-8d70-e6b89168b1a4

📥 Commits

Reviewing files that changed from the base of the PR and between 4190275 and d55f8d9.

📒 Files selected for processing (2)
  • modelopt/torch/puzzletron/mip/sweep.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py

Comment on lines 47 to +48
mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]
batch_size = mip_subblock_args["batch_size"]
weights_dtype = str(mip_subblock_args["weights_dtype"])
activations_dtype = str(mip_subblock_args["activations_dtype"])
kv_cache_dtype = str(mip_subblock_args["kv_cache_dtype"])
subblock_stats_args = OmegaConf.to_container(mip_subblock_args, resolve=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Assert the singleton subblock_stats_args assumption here.

Both teacher-total helpers always read hydra_cfg.mip.subblock_stats_args[0]. If a config ever carries more than one measurement scenario, these totals come from the wrong row and run_mip_sweep() will derive the wrong teacher baseline. Either thread the intended scenario through explicitly or fail fast unless this list is a singleton.

♻️ Suggested guard
-    mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]
+    if len(hydra_cfg.mip.subblock_stats_args) != 1:
+        raise ValueError(
+            "Expected exactly one mip.subblock_stats_args entry; pass the desired scenario explicitly."
+        )
+    mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]
batch_size = mip_subblock_args["batch_size"]
weights_dtype = str(mip_subblock_args["weights_dtype"])
activations_dtype = str(mip_subblock_args["activations_dtype"])
kv_cache_dtype = str(mip_subblock_args["kv_cache_dtype"])
subblock_stats_args = OmegaConf.to_container(mip_subblock_args, resolve=True)
if len(hydra_cfg.mip.subblock_stats_args) != 1:
raise ValueError(
"Expected exactly one mip.subblock_stats_args entry; pass the desired scenario explicitly."
)
mip_subblock_args = hydra_cfg.mip.subblock_stats_args[0]
subblock_stats_args = OmegaConf.to_container(mip_subblock_args, resolve=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 47 - 48, The code
assumes hydra_cfg.mip.subblock_stats_args is a singleton but always uses index
[0] (used by the teacher-total helpers and run_mip_sweep), so add a fail-fast
guard that asserts this is length 1 before converting: check
len(hydra_cfg.mip.subblock_stats_args) == 1 and raise a clear ValueError (or
AssertionError) if not, so the code in sweep.py (where mip_subblock_args and
subblock_stats_args are derived) explicitly enforces the singleton assumption
and prevents silently using the wrong scenario.

Comment on lines +94 to 95
total_memory = subblock_stats.get("non_block", {}).get("memory_mib", 0.0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast on missing non_block metrics instead of treating them as zero.

teacher_memory drives the sweep target, so defaulting non_block.memory_mib / non_block.num_params to 0 turns malformed subblock_stats.json into a smaller baseline instead of surfacing the schema error. These keys look required here, just like the per-layer block_stats[...] accesses below.

♻️ Suggested guard
-    total_memory = subblock_stats.get("non_block", {}).get("memory_mib", 0.0)
+    try:
+        total_memory = subblock_stats["non_block"]["memory_mib"]
+    except KeyError as e:
+        raise ValueError("Missing non_block.memory_mib in subblock_stats.json") from e
-    total_params = subblock_stats.get("non_block", {}).get("num_params", 0)
+    try:
+        total_params = subblock_stats["non_block"]["num_params"]
+    except KeyError as e:
+        raise ValueError("Missing non_block.num_params in subblock_stats.json") from e

Also applies to: 117-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 94 - 95, The code
currently defaults subblock_stats.get("non_block", {}).get("memory_mib", 0.0)
(and similar for num_params) to zero which masks malformed subblock_stats;
change the logic in sweep.py to fail fast: check that subblock_stats contains
the "non_block" key and that "memory_mib" and "num_params" exist (raise a clear
ValueError or RuntimeError including subblock identifier and the offending dict
when missing) before computing total_memory/teacher_memory, and apply the same
existence checks for per-layer accesses of block_stats[...] so malformed JSON
surfaces immediately rather than silently using zeros.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (4190275) to head (d55f8d9).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           feature/puzzletron    #1085   +/-   ##
===================================================
  Coverage               72.12%   72.12%           
===================================================
  Files                     209      209           
  Lines                   23628    23628           
===================================================
  Hits                    17042    17042           
  Misses                   6586     6586           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danielkorzekwa danielkorzekwa merged commit 0708ca2 into feature/puzzletron Mar 24, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/anymodel_subblock_stats branch March 24, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants