Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-23T17:22:22Z

What does this PR do?

Refactoring of subblock stats to stop using DeciLM code and use anymodel instead.

Summary by CodeRabbit

Refactor
- Restructured internal estimation logic for model memory and parameters to support broader model architectures.
- Updated model initialization utilities.
Tests
- Updated baseline metrics for validation tests.

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai · 2026-03-23T17:22:28Z

📝 Walkthrough

Walkthrough

The changes remove the DeciLMMoe mixture-of-experts implementation and refactor the parameter and memory calculation system from closed-form formulas to a descriptor-based meta-model construction approach. Supporting utilities are added for model class resolution and initialization. Test expectations are updated accordingly.

Changes

Cohort / File(s)	Summary
MoE Removal `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py`	Removed `DeciLMMoe` class implementation, including router initialization, top-k selection logic, and forward pass; updated module docstring and imports.
Subblock Stats Refactoring `modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py`	Replaced closed-form parameter/memory calculation functions (`calculate_attention_params`, `calculate_mamba_params`, `calculate_ffn_params`, etc.) with descriptor-based meta-model construction. `calculate_subblock_params` now uses `EmptyInitOnDevice` and descriptor post-init hooks to compute parameters; memory functions updated to call new `calculate_subblock_params`.
Subblock Stats Threading `modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`	Added `model_config: PretrainedConfig` and `descriptor: Type[ModelDescriptor]` parameters to `calculate_subblock_stats` and `calculate_subblock_stats_for_puzzle_dir`; threaded these arguments through to downstream `calculate_subblock_memory` and parameter calculation calls.
Model Initialization Utilities `modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`	Added `init_model_from_config` function for creating meta models via descriptor approach, and `_get_model_class_from_config` to resolve model classes from `config.architectures[0]` with fallback to `AutoModelForCausalLM`. Expanded imports to include `AutoModelForCausalLM` and `transformers`.
Test Expectations `tests/gpu/torch/puzzletron/test_puzzletron.py`	Updated `EXPECTED_TEACHER_MEMORY_MIB` and `EXPECTED_TEACHER_NUM_PARAMS` hardcoded values for multiple models (Llama-3.1-8B, Llama-3.2-3B, Mistral-Small-24B, Nemotron-3, Qwen variants) to reflect refactored calculation results.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Dkorzekwa/anymodel subblock stats nodecilm' is vague and uses non-descriptive terms like branch names and acronyms without conveying the actual purpose of the changes.	Revise the title to clearly describe the main objective, such as 'Refactor subblock stats to use anymodel instead of DeciLM-specific code' or 'Remove DeciLM-specific dependencies from subblock stats calculation'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns	✅ Passed	Security review confirms no anti-patterns: trust_remote_code is configurable parameter, no hardcoded unsafe patterns found.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dkorzekwa/anymodel_subblock_stats_nodecilm

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kevalmorabia97 · 2026-03-23T19:46:08Z

modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py

+        mprint(
+            f"Warning: {model_class_name} not found in transformers, "


Suggested change

mprint(

f"Warning: {model_class_name} not found in transformers, "

warnings.warn(

f"{model_class_name} not found in transformers, "

let's not do it and approach unifying logging across the board, using modelopt.torch.utils.logging.py

kevalmorabia97 · 2026-03-23T19:47:53Z

modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py

Is sub-block stats generic enough to move to modelopt.torch.utils just like runtime stats so we could reuse in other places?

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (1)

348-359: ⚠️ Potential issue | 🟠 Major

model_hidden_sizes sweeps are now internally inconsistent.

Lines 351-352 always forward the teacher model_config, but subblock params now come from calculate_subblock_params(model_config, ...) rather than from n_embd. Combined with Line 359’s n_embd=model_hidden_size, any non-teacher sweep gets cache/non-block numbers for the override and subblock params/memory for the teacher width.

🛠️ Minimal safe guard

     for batch_size, (
         weights_dtype,
         activations_dtype,
         kv_cache_dtype,
     ), model_hidden_size in product(batch_sizes, data_types, model_hidden_sizes):
+        if model_hidden_size != lm_config.hidden_size:
+            raise NotImplementedError(
+                "model_hidden_sizes overrides are not supported with config-based subblock param counting yet."
+            )
         if num_active_tokens_override is not None:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
348 - 359, The call to calculate_subblock_stats is passing model_hidden_size via
n_embd while the code derives subblock widths from
calculate_subblock_params(model_config, ...), causing mismatched widths for
teacher vs swept models; compute the effective embedding width from the subblock
params returned by calculate_subblock_params (e.g., use the subblock params'
n_embd/hidden size) and pass that value as n_embd to calculate_subblock_stats
instead of model_hidden_size, keeping the teacher model_config/teacher_dir usage
unchanged so the subblock stats use the same width source as
calculate_subblock_params.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py`:
- Around line 114-119: The current branch replaces measured sublayers by calling
to_blockconfig() on FFNConfig/AttentionConfig which loses parallel_blocks
context and the inner_block_idx, causing meta-init to instantiate incorrect
topology; change the logic in the block that handles layer_config so that when
layer_config originates from a BlockConfig.parallel_blocks entry you retain or
reconstruct the parent BlockConfig plus the selected inner_block_idx (e.g.
accept/propagate an inner_block_idx parameter or use the original BlockConfig
object instead of calling to_blockconfig()), or modify
FFNConfig.to_blockconfig()/AttentionConfig.to_blockconfig() to accept and
preserve inner_block_idx/parent-parallel context so the instantiated
block_config still reflects the parallel_blocks entry. Ensure refs to
layer_config, FFNConfig, AttentionConfig, to_blockconfig(),
BlockConfig.parallel_blocks, and inner_block_idx are used to locate and update
the code paths.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 156-160: The function init_model_from_config currently defaults
trust_remote_code=True which is a security risk; change the default to
trust_remote_code: bool = False in the function signature and any
overloaded/duplicate signatures (e.g., init_model_from_config) so callers must
opt-in explicitly, and update any internal calls or tests that relied on the old
default to pass trust_remote_code=True where appropriate; ensure any docstring
or usage comment near init_model_from_config reflects the new safe default.

---

Outside diff comments:
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 348-359: The call to calculate_subblock_stats is passing
model_hidden_size via n_embd while the code derives subblock widths from
calculate_subblock_params(model_config, ...), causing mismatched widths for
teacher vs swept models; compute the effective embedding width from the subblock
params returned by calculate_subblock_params (e.g., use the subblock params'
n_embd/hidden size) and pass that value as n_embd to calculate_subblock_stats
instead of model_hidden_size, keeping the teacher model_config/teacher_dir usage
unchanged so the subblock stats use the same width source as
calculate_subblock_params.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0b698ceb-94cc-4adf-bf77-79a62e79882a

📥 Commits

Reviewing files that changed from the base of the PR and between 837e14f and 4458fb9.

📒 Files selected for processing (5)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
tests/gpu/torch/puzzletron/test_puzzletron.py

coderabbitai · 2026-03-23T20:05:18Z

modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py

+    if isinstance(layer_config, FFNConfig):
+        block_config = layer_config.to_blockconfig()
+    elif isinstance(layer_config, AttentionConfig):
+        block_config = layer_config.to_blockconfig()
+    else:
+        block_config = layer_config


⚠️ Potential issue | 🟠 Major

to_blockconfig() drops parallel_blocks context.

Line 114 and Line 116 collapse the measured sublayer into a standalone BlockConfig. That cannot represent entries coming from BlockConfig.parallel_blocks: the replacement-library loader already preserves an inner_block_idx for those cases, but this API has nowhere to carry it, so meta-init instantiates the wrong layer topology and miscounts params/memory for parallel-block models.

If we want to keep the meta-init approach here, this helper needs the parent BlockConfig plus the selected inner_block_idx, not just a bare FFNConfig/AttentionConfig.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py` around lines 114 - 119, The current branch replaces measured sublayers by calling to_blockconfig() on FFNConfig/AttentionConfig which loses parallel_blocks context and the inner_block_idx, causing meta-init to instantiate incorrect topology; change the logic in the block that handles layer_config so that when layer_config originates from a BlockConfig.parallel_blocks entry you retain or reconstruct the parent BlockConfig plus the selected inner_block_idx (e.g. accept/propagate an inner_block_idx parameter or use the original BlockConfig object instead of calling to_blockconfig()), or modify FFNConfig.to_blockconfig()/AttentionConfig.to_blockconfig() to accept and preserve inner_block_idx/parent-parallel context so the instantiated block_config still reflects the parallel_blocks entry. Ensure refs to layer_config, FFNConfig, AttentionConfig, to_blockconfig(), BlockConfig.parallel_blocks, and inner_block_idx are used to locate and update the code paths.

modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py

…ats_nodecilm Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

♻️ Duplicate comments (1)

modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
138-150: ⚠️ Potential issue | 🔴 Critical

CRITICAL: Default trust_remote_code to False.

The function defaults trust_remote_code=True (line 141), which violates repository security guidelines. Per SECURITY.md and coding guidelines, this flag must default to False to prevent callers from silently opting into executing checkpoint-supplied Python during model construction. Callers should explicitly opt-in at audited call sites.

The call site in calc_subblock_params_and_memory.py (line 167) already passes trust_remote_code=descriptor.requires_trust_remote_code(), so changing the default here should not break existing functionality.
🔐 Proposed fix
 def init_model_from_config(
     config: PretrainedConfig,
     *,
-    trust_remote_code: bool = True,
+    trust_remote_code: bool = False,
     **kwargs,
 ) -> PreTrainedModel:
As per coding guidelines: "Do not hardcode trust_remote_code=True when loading Hugging Face Transformers models. Let the caller decide via a parameter; default to False."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 138 -
150, The init_model_from_config function currently defaults trust_remote_code to
True which violates security guidelines; change the default to False in the
function signature (init_model_from_config) and ensure callers that need remote
code explicitly pass trust_remote_code=True (e.g.,
calc_subblock_params_and_memory uses descriptor.requires_trust_remote_code()).
Update only the default value for the trust_remote_code parameter and keep the
existing conditional logic using model_class.from_config and
model_class._from_config unchanged.

🧹 Nitpick comments (2)

modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py (1)
322-331: Unused parameter experts_dtype.

The experts_dtype parameter (line 327) is added to the function signature but is never used in the function body. The TODO comment on line 329 suggests this was intended for separating expert weights from the rest, but the current implementation doesn't differentiate.

Consider either:

Removing the unused parameter until the TODO is addressed

Implementing the expert weight separation logic
Option 1: Remove unused parameter
 def calculate_ffn_memory(
     ffn_config: FFNConfig,
     model_config: PretrainedConfig,
     descriptor: Type[ModelDescriptor],
     weights_dtype: torch.dtype | str,
-    experts_dtype: torch.dtype | str | None = None,
 ) -> float:
-    # TODO: How to separate between expert weights and the rest for any model (same as puzzletron).
     num_params = calculate_subblock_params(model_config, ffn_config, descriptor)
     return num_params * sizeof_dtype(weights_dtype) / 2**20
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py`
around lines 322 - 331, The function calculate_ffn_memory has an unused
parameter experts_dtype; either remove it from the signature and from all
callers (and update any type hints referencing FFNConfig/descriptor usage) or
implement expert vs non-expert separation: call calculate_subblock_params twice
(or modify it) to get expert_params and non_expert_params, compute memory as
expert_params * sizeof_dtype(experts_dtype) + non_expert_params *
sizeof_dtype(weights_dtype) and return the total in MiB (divide by 2**20); use
the existing symbols calculate_subblock_params, sizeof_dtype,
calculate_ffn_memory, FFNConfig and descriptor to locate where to change.
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
125-135: Consider extracting shared helper to avoid duplication.

This _get_model_class_from_config implementation is nearly identical to the one in modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py (lines 99-112). The only difference is using warnings.warn here vs mprint there.

Consider extracting this helper to a shared location to avoid maintaining duplicate logic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 125 -
135, The _get_model_class_from_config function duplicates logic present in
sharded_checkpoint_utils (same name/behavior but different warning call);
extract this logic into a single shared helper (e.g.,
get_model_class_from_config) placed in a utilities module and have both
checkpoint_utils_hf._get_model_class_from_config and sharded_checkpoint_utils
call that helper; unify the warning/logging behavior by accepting an optional
warn_func parameter (or default to warnings.warn) so callers can pass mprint
when needed, and update both callers to use the new helper to remove
duplication.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 138-150: The init_model_from_config function currently defaults
trust_remote_code to True which violates security guidelines; change the default
to False in the function signature (init_model_from_config) and ensure callers
that need remote code explicitly pass trust_remote_code=True (e.g.,
calc_subblock_params_and_memory uses descriptor.requires_trust_remote_code()).
Update only the default value for the trust_remote_code parameter and keep the
existing conditional logic using model_class.from_config and
model_class._from_config unchanged.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py`:
- Around line 322-331: The function calculate_ffn_memory has an unused parameter
experts_dtype; either remove it from the signature and from all callers (and
update any type hints referencing FFNConfig/descriptor usage) or implement
expert vs non-expert separation: call calculate_subblock_params twice (or modify
it) to get expert_params and non_expert_params, compute memory as expert_params
* sizeof_dtype(experts_dtype) + non_expert_params * sizeof_dtype(weights_dtype)
and return the total in MiB (divide by 2**20); use the existing symbols
calculate_subblock_params, sizeof_dtype, calculate_ffn_memory, FFNConfig and
descriptor to locate where to change.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 125-135: The _get_model_class_from_config function duplicates
logic present in sharded_checkpoint_utils (same name/behavior but different
warning call); extract this logic into a single shared helper (e.g.,
get_model_class_from_config) placed in a utilities module and have both
checkpoint_utils_hf._get_model_class_from_config and sharded_checkpoint_utils
call that helper; unify the warning/logging behavior by accepting an optional
warn_func parameter (or default to warnings.warn) so callers can pass mprint
when needed, and update both callers to use the new helper to remove
duplication.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4fd8f067-0fd7-4794-af5d-e04e07da9d54

📥 Commits

Reviewing files that changed from the base of the PR and between 4458fb9 and 684ed36.

📒 Files selected for processing (2)

modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py

codecov · 2026-03-24T10:21:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.10%. Comparing base (0708ca2) to head (684ed36).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           feature/puzzletron    #1102      +/-   ##
======================================================
- Coverage               72.12%   72.10%   -0.03%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
- Hits                    17042    17036       -6     
- Misses                   6586     6592       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

danielkorzekwa added 30 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 12 commits March 19, 2026 09:31

Delete not used decilm code

1834c76

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove dead replacement_library code

f096d11

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used transformers code

dc52a81

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete unused decilm code

b9178a3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Import clean up.

9c496bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Support moe in sweep.py

467247a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Code clean up.

034e77d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add assertions for memory subblock stats

4bbdeaf

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code clean up

837e14f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove DeciLMMoe

c0a0cb0

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update comments

855f4a6

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Code clean up

4458fb9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa requested a review from a team as a code owner March 23, 2026 17:22

kevalmorabia97 reviewed Mar 23, 2026

View reviewed changes

kevalmorabia97 approved these changes Mar 23, 2026

View reviewed changes

kevalmorabia97 reviewed Mar 23, 2026

View reviewed changes

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

Base automatically changed from dkorzekwa/anymodel_subblock_stats to feature/puzzletron March 24, 2026 07:41

danielkorzekwa requested review from a team as code owners March 24, 2026 07:41

danielkorzekwa requested a review from jingyu-ml March 24, 2026 07:41

danielkorzekwa added 3 commits March 24, 2026 01:48

Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_subblock_st…

e95619d

…ats_nodecilm Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove hacky _suspend_keep_in_fp32_modules_for_model_init

a09462d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove hacky _suspend_keep_in_fp32_modules_for_model_init

684ed36

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

danielkorzekwa merged commit 3193f30 into feature/puzzletron Mar 24, 2026
28 checks passed

danielkorzekwa deleted the dkorzekwa/anymodel_subblock_stats_nodecilm branch March 24, 2026 11:17

danielkorzekwa mentioned this pull request Mar 25, 2026

Merge puzzletron compression algorithm #1121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dkorzekwa/anymodel subblock stats nodecilm#1102

Dkorzekwa/anymodel subblock stats nodecilm#1102
danielkorzekwa merged 125 commits intofeature/puzzletronfrom
dkorzekwa/anymodel_subblock_stats_nodecilm

danielkorzekwa commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

kevalmorabia97 Mar 23, 2026

Uh oh!

danielkorzekwa Mar 24, 2026

Uh oh!

kevalmorabia97 Mar 23, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		mprint(
		f"Warning: {model_class_name} not found in transformers, "

-        mprint(
-            f"Warning: {model_class_name} not found in transformers, "
+        warnings.warn(
+            f"{model_class_name} not found in transformers, "

Conversation

danielkorzekwa commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

kevalmorabia97 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

danielkorzekwa Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielkorzekwa commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading