Skip to content

Draft: Merge anymodel pruning#990

Merged
danielkorzekwa merged 36 commits intofeature/puzzletronfrom
dkorzekwa/anymodel_pruning
Mar 12, 2026
Merged

Draft: Merge anymodel pruning#990
danielkorzekwa merged 36 commits intofeature/puzzletronfrom
dkorzekwa/anymodel_pruning

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 6, 2026

What does this PR do?

Merging dkorzekwa/anymodel_pruning into dkorzekwa/anymodel_acivation scoring - this MR is only for reviewing. Ultimately dkorzekwa/anymodel_pruning should be merged into feature/puzzletron once dkorzekwa/anymodel_acivation scoring is merged there.

Summary by CodeRabbit

  • Refactor

    • Moved model initialization to a descriptor-driven flow; accepts dict or JSON overrides and new optional init parameters. Safer handling for vision–language models, improved state creation and checkpoint saving, and clearer profiling output.
  • Bug Fixes

    • Re-enabled pruning checkpoint step to run on a single process with proper synchronization.
  • Documentation

    • Clarified and reformatted validation docstrings, expanding configuration and data-processing options.

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 6, 2026 09:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d7f96f67-5150-484f-807a-00a2c1cc9d93

📥 Commits

Reviewing files that changed from the base of the PR and between d9a8647 and 8398294.

📒 Files selected for processing (1)
  • modelopt/torch/puzzletron/tools/validate_model.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/puzzletron/tools/validate_model.py

📝 Walkthrough

Walkthrough

Refactors init_child_from_parent to accept a ModelDescriptor and pruning_mixin, parse model_config_overrides (dict or JSON), instantiate models via deci_x_patcher with descriptor-driven class lookup, separate block-config overrides, propagate child_config/pruning_mixin into state-dict creation and checkpoint saving; minor docstring reflow in validate_model and enables a synchronized pruning step in puzzletron.py.

Changes

Cohort / File(s) Summary
Model Initialization Refactor
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
Rewrote function signature to accept descriptor and pruning_mixin; model_config_overrides now accepts `dict
Docstring reflow (no logic change)
modelopt/torch/puzzletron/tools/validate_model.py
Reflowed Args docstring sections (Model Configuration, Dataset Configuration, Data Processing, Activation Hooks, Execution Options); no signature or runtime behavior changes.
Pruning step control flow
modelopt/torch/puzzletron/puzzletron.py
Uncommented/enabled Step 2 pruning checkpoint operation to run on master process guarded by dist.is_master() and added dist.barrier() synchronization (previously commented-out).

Sequence Diagram(s)

sequenceDiagram
    participant Caller as init_child_from_parent
    participant Descriptor as ModelDescriptorFactory
    participant Patcher as deci_x_patcher
    participant ModelClass as ResolvedModelClass
    participant State as create_child_state_dict
    participant Saver as _save_checkpoint

    Caller->>Descriptor: ensure/convert descriptor & parse overrides
    Descriptor-->>Caller: ModelDescriptor + base child_config
    Caller->>Patcher: provide descriptor, block_configs, patch options
    Patcher->>ModelClass: resolve/instantiate via (from_config/_from_config)
    ModelClass-->>Caller: instantiated child model
    Caller->>State: create_child_state_dict(child_model, child_config, pruning_mixin, descriptor)
    State-->>Caller: child_state_dict
    Caller->>Saver: _save_checkpoint(child_state_dict, output_dir, descriptor, max_workers)
    Saver-->>Caller: checkpoint saved
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 inconclusive)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Pull request introduces two instances of hardcoded trust_remote_code=True without justifying inline comments, violating SECURITY.md Rule 3 and enabling arbitrary code execution. Add inline comments explaining trusted sources for both instances and expose trust_remote_code as configurable parameter in validate_model.py; request explicit security review.
Title check ❓ Inconclusive The title 'Draft: Merge anymodel pruning' is vague and uses generic merger language that doesn't convey the specific technical changes made. Consider a more descriptive title that highlights the main technical change, such as 'Refactor init_child_from_parent to support ModelDescriptor-driven initialization' or 'Enable pruning checkpoints in puzzletron Step 2 with distributed synchronization'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dkorzekwa/anymodel_pruning
📝 Coding Plan for PR comments
  • Generate coding plan

Comment @coderabbitai help to get the list of available commands and usage tips.

pruning_mixin,
parent_checkpoint_dir: str,
model_config_overrides_json: str,
model_config_overrides_dict: dict,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model_config_overrides_dict: dict,
model_config_overrides_dict: dict | str,

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Base automatically changed from dkorzekwa/anymodel_activation_scoring to feature/puzzletron March 12, 2026 09:36
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 12, 2026 09:36
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`:
- Around line 92-95: When parsing model_config_overrides_dict (in
init_child_from_parent), ensure the JSON string decodes to a mapping before
proceeding: after json.loads(model_config_overrides_dict) check that the result
is a dict (or collections.abc.Mapping) and if not raise a clear TypeError or
ValueError (e.g. "model_config_overrides_dict must decode to an object/dict, got
<type>") so downstream use of .items() won't raise an opaque attribute error.
- Around line 123-131: The code currently hardcodes trust_remote_code=True when
constructing child models (see AutoModelForCausalLM branch and child_model
creation), which creates an RCE risk; add a caller-controlled boolean parameter
(e.g., trust_remote_code: bool = False) to the function that contains this logic
(init_child_from_parent or the enclosing function), thread that flag through to
the model creation call, and use it instead of the hardcoded True for
model_class.from_config(..., trust_remote_code=trust_remote_code); leave the
default False and document that callers must explicitly opt in for remote code.

In `@modelopt/torch/puzzletron/tools/validate_model.py`:
- Around line 113-117: The docstring allows activation_hooks_kwargs as a
comma-separated string but the code immediately treats it as a dict causing a
TypeError; update the handling in validate_model.py so activation_hooks_kwargs
accepts the legacy string form: if isinstance(activation_hooks_kwargs, str)
parse it into a dict by splitting on commas and '=' (skip empty entries), strip
whitespace, and convert values using ast.literal_eval (fall back to raw string
on ValueError); then proceed as before (ensure empty/None remains None);
reference the activation_hooks_kwargs parameter and the activation hook
registration logic to locate where to add this parsing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5bfced19-a436-48be-893e-c62a53341763

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc9932 and 0fc10a1.

📒 Files selected for processing (2)
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/validate_model.py

Comment on lines +92 to 95
# Parse JSON if string
if isinstance(model_config_overrides_dict, str):
model_config_overrides_dict = json.loads(model_config_overrides_dict)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject non-object JSON overrides early.

If the string decodes to anything other than a mapping, Line 100 fails later with a generic '... has no attribute items'. Validate the parsed type here and raise a clearer error at the boundary.

Proposed fix
     if isinstance(model_config_overrides_dict, str):
         model_config_overrides_dict = json.loads(model_config_overrides_dict)
+    if not isinstance(model_config_overrides_dict, dict):
+        raise TypeError(
+            "model_config_overrides_dict must be a dict or a JSON object string"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`
around lines 92 - 95, When parsing model_config_overrides_dict (in
init_child_from_parent), ensure the JSON string decodes to a mapping before
proceeding: after json.loads(model_config_overrides_dict) check that the result
is a dict (or collections.abc.Mapping) and if not raise a clear TypeError or
ValueError (e.g. "model_config_overrides_dict must decode to an object/dict, got
<type>") so downstream use of .items() won't raise an opaque attribute error.

Comment on lines +123 to +131
with deci_x_patcher(
model_descriptor=descriptor, block_configs=child_model_config.block_configs
):
model_class = _get_model_class_from_config(child_model_config)
# AutoModelForCausalLM uses from_config(); concrete model classes use _from_config()
if model_class is AutoModelForCausalLM:
child_model = model_class.from_config(child_model_config, trust_remote_code=True)
else:
child_model = model_class._from_config(child_model_config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the file and check if it exists
find . -type f -name "init_child_from_parent.py" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 143


🏁 Script executed:

# Once found, read the relevant section of the file
cd modelopt/torch/puzzletron/tools/bypassed_training 2>/dev/null && wc -l init_child_from_parent.py

Repository: NVIDIA/Model-Optimizer

Length of output: 95


🏁 Script executed:

# Read the full function to understand the context and function signature
cat -n modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py | head -150

Repository: NVIDIA/Model-Optimizer

Length of output: 6974


🏁 Script executed:

# Check for inline comments justifying the hardcoded True and look at more context
cat -n modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py | sed -n '120,140p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1179


🏁 Script executed:

# Search for all calls to init_child_from_parent function
rg "init_child_from_parent\(" --type py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 2855


🏁 Script executed:

# Check if there are other uses of trust_remote_code in the codebase to understand patterns
rg "trust_remote_code" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 22982


🏁 Script executed:

# Check the sharded_checkpoint_utils.py for similar issues
cat -n modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py | grep -B 5 -A 5 "trust_remote_code"

Repository: NVIDIA/Model-Optimizer

Length of output: 756


🏁 Script executed:

# Let's see the full context of the call in pruning_ckpts.py to understand the impact
rg "init_child_from_parent\(" modelopt/torch/puzzletron/pruning/pruning_ckpts.py -B 2 -A 15 | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4850


🏁 Script executed:

# Verify that there are no inline comments justifying the hardcoded True
sed -n '126,132p' modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py

Repository: NVIDIA/Model-Optimizer

Length of output: 481


Don't hardcode trust_remote_code=True.

Line 129 turns child-model construction into an RCE boundary for any checkpoint/config that carries custom modeling code. Thread this through as a caller-controlled flag with a safe default of False, and only enable it under a documented exception.

Proposed fix
 def init_child_from_parent(
     descriptor: ModelDescriptor,
     pruning_mixin,
     parent_checkpoint_dir: str,
     model_config_overrides_dict: dict | str,
     output_checkpoint_dir: str,
     gqa_init_mode: GQAInitMode,
     mlp_init_mode: MlpInitMode,
     mlp_init_config_yaml: Optional[str],
     linear_init_mode: LinearInitMode,
     hidden_size_init_mode: Optional[HiddenSizeInitMode] = None,
     channel_importance_path: Optional[str] = None,
     max_workers: Optional[int] = None,  # Auto-calculate optimal workers if None
     max_layer_workers: Optional[int] = None,  # Auto-calculate optimal workers if None
+    trust_remote_code: bool = False,
 ) -> None:
@@
             if model_class is AutoModelForCausalLM:
-                child_model = model_class.from_config(child_model_config, trust_remote_code=True)
+                child_model = model_class.from_config(
+                    child_model_config, trust_remote_code=trust_remote_code
+                )
             else:
                 child_model = model_class._from_config(child_model_config)

Per coding guidelines: "Do not hardcode trust_remote_code=True when loading transformers models. Let the caller decide via a parameter with default value False."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`
around lines 123 - 131, The code currently hardcodes trust_remote_code=True when
constructing child models (see AutoModelForCausalLM branch and child_model
creation), which creates an RCE risk; add a caller-controlled boolean parameter
(e.g., trust_remote_code: bool = False) to the function that contains this logic
(init_child_from_parent or the enclosing function), thread that flag through to
the model creation call, and use it instead of the hardcoded True for
model_class.from_config(..., trust_remote_code=trust_remote_code); leave the
default False and document that callers must explicitly opt in for remote code.

Comment on lines +113 to +117
Activation Hooks:
- activations_log_dir (str, optional): Directory to log activation scores. If provided,
hooks will be registered to capture activations.
- activation_hooks_kwargs (str or dict, optional): Arguments for activation hooks.
If string, comma-separated format: "arg1=val1,arg2=val2".
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Documented string input currently crashes.

The docstring now says activation_hooks_kwargs may be a comma-separated string, but Line 153 immediately treats it as a mutable dict. Passing the documented string form will raise TypeError. Either parse the legacy string here or narrow the docs to dict only.

Proposed fix
         activation_hooks_kwargs = args.activation_hooks_kwargs or {}
+        if isinstance(activation_hooks_kwargs, str):
+            activation_hooks_kwargs = simple_parse_args_string(activation_hooks_kwargs)
         activation_hooks_kwargs["validation_full_iters"] = validation_full_iters
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 113 - 117,
The docstring allows activation_hooks_kwargs as a comma-separated string but the
code immediately treats it as a dict causing a TypeError; update the handling in
validate_model.py so activation_hooks_kwargs accepts the legacy string form: if
isinstance(activation_hooks_kwargs, str) parse it into a dict by splitting on
commas and '=' (skip empty entries), strip whitespace, and convert values using
ast.literal_eval (fall back to raw string on ValueError); then proceed as before
(ensure empty/None remains None); reference the activation_hooks_kwargs
parameter and the activation hook registration logic to locate where to add this
parsing.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (4dc9932) to head (8398294).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           feature/puzzletron     #990   +/-   ##
===================================================
  Coverage               72.12%   72.12%           
===================================================
  Files                     209      209           
  Lines                   23628    23628           
===================================================
  Hits                    17042    17042           
  Misses                   6586     6586           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa merged commit d358eb3 into feature/puzzletron Mar 12, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/anymodel_pruning branch March 12, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants