Draft: Merge anymodel pruning#990
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors init_child_from_parent to accept a ModelDescriptor and pruning_mixin, parse model_config_overrides (dict or JSON), instantiate models via deci_x_patcher with descriptor-driven class lookup, separate block-config overrides, propagate child_config/pruning_mixin into state-dict creation and checkpoint saving; minor docstring reflow in validate_model and enables a synchronized pruning step in puzzletron.py. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as init_child_from_parent
participant Descriptor as ModelDescriptorFactory
participant Patcher as deci_x_patcher
participant ModelClass as ResolvedModelClass
participant State as create_child_state_dict
participant Saver as _save_checkpoint
Caller->>Descriptor: ensure/convert descriptor & parse overrides
Descriptor-->>Caller: ModelDescriptor + base child_config
Caller->>Patcher: provide descriptor, block_configs, patch options
Patcher->>ModelClass: resolve/instantiate via (from_config/_from_config)
ModelClass-->>Caller: instantiated child model
Caller->>State: create_child_state_dict(child_model, child_config, pruning_mixin, descriptor)
State-->>Caller: child_state_dict
Caller->>Saver: _save_checkpoint(child_state_dict, output_dir, descriptor, max_workers)
Saver-->>Caller: checkpoint saved
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Comment |
| pruning_mixin, | ||
| parent_checkpoint_dir: str, | ||
| model_config_overrides_json: str, | ||
| model_config_overrides_dict: dict, |
There was a problem hiding this comment.
| model_config_overrides_dict: dict, | |
| model_config_overrides_dict: dict | str, |
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`:
- Around line 92-95: When parsing model_config_overrides_dict (in
init_child_from_parent), ensure the JSON string decodes to a mapping before
proceeding: after json.loads(model_config_overrides_dict) check that the result
is a dict (or collections.abc.Mapping) and if not raise a clear TypeError or
ValueError (e.g. "model_config_overrides_dict must decode to an object/dict, got
<type>") so downstream use of .items() won't raise an opaque attribute error.
- Around line 123-131: The code currently hardcodes trust_remote_code=True when
constructing child models (see AutoModelForCausalLM branch and child_model
creation), which creates an RCE risk; add a caller-controlled boolean parameter
(e.g., trust_remote_code: bool = False) to the function that contains this logic
(init_child_from_parent or the enclosing function), thread that flag through to
the model creation call, and use it instead of the hardcoded True for
model_class.from_config(..., trust_remote_code=trust_remote_code); leave the
default False and document that callers must explicitly opt in for remote code.
In `@modelopt/torch/puzzletron/tools/validate_model.py`:
- Around line 113-117: The docstring allows activation_hooks_kwargs as a
comma-separated string but the code immediately treats it as a dict causing a
TypeError; update the handling in validate_model.py so activation_hooks_kwargs
accepts the legacy string form: if isinstance(activation_hooks_kwargs, str)
parse it into a dict by splitting on commas and '=' (skip empty entries), strip
whitespace, and convert values using ast.literal_eval (fall back to raw string
on ValueError); then proceed as before (ensure empty/None remains None);
reference the activation_hooks_kwargs parameter and the activation hook
registration logic to locate where to add this parsing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5bfced19-a436-48be-893e-c62a53341763
📒 Files selected for processing (2)
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pymodelopt/torch/puzzletron/tools/validate_model.py
| # Parse JSON if string | ||
| if isinstance(model_config_overrides_dict, str): | ||
| model_config_overrides_dict = json.loads(model_config_overrides_dict) | ||
|
|
There was a problem hiding this comment.
Reject non-object JSON overrides early.
If the string decodes to anything other than a mapping, Line 100 fails later with a generic '... has no attribute items'. Validate the parsed type here and raise a clearer error at the boundary.
Proposed fix
if isinstance(model_config_overrides_dict, str):
model_config_overrides_dict = json.loads(model_config_overrides_dict)
+ if not isinstance(model_config_overrides_dict, dict):
+ raise TypeError(
+ "model_config_overrides_dict must be a dict or a JSON object string"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`
around lines 92 - 95, When parsing model_config_overrides_dict (in
init_child_from_parent), ensure the JSON string decodes to a mapping before
proceeding: after json.loads(model_config_overrides_dict) check that the result
is a dict (or collections.abc.Mapping) and if not raise a clear TypeError or
ValueError (e.g. "model_config_overrides_dict must decode to an object/dict, got
<type>") so downstream use of .items() won't raise an opaque attribute error.
| with deci_x_patcher( | ||
| model_descriptor=descriptor, block_configs=child_model_config.block_configs | ||
| ): | ||
| model_class = _get_model_class_from_config(child_model_config) | ||
| # AutoModelForCausalLM uses from_config(); concrete model classes use _from_config() | ||
| if model_class is AutoModelForCausalLM: | ||
| child_model = model_class.from_config(child_model_config, trust_remote_code=True) | ||
| else: | ||
| child_model = model_class._from_config(child_model_config) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the file and check if it exists
find . -type f -name "init_child_from_parent.py" | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 143
🏁 Script executed:
# Once found, read the relevant section of the file
cd modelopt/torch/puzzletron/tools/bypassed_training 2>/dev/null && wc -l init_child_from_parent.pyRepository: NVIDIA/Model-Optimizer
Length of output: 95
🏁 Script executed:
# Read the full function to understand the context and function signature
cat -n modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py | head -150Repository: NVIDIA/Model-Optimizer
Length of output: 6974
🏁 Script executed:
# Check for inline comments justifying the hardcoded True and look at more context
cat -n modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py | sed -n '120,140p'Repository: NVIDIA/Model-Optimizer
Length of output: 1179
🏁 Script executed:
# Search for all calls to init_child_from_parent function
rg "init_child_from_parent\(" --type py -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 2855
🏁 Script executed:
# Check if there are other uses of trust_remote_code in the codebase to understand patterns
rg "trust_remote_code" --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 22982
🏁 Script executed:
# Check the sharded_checkpoint_utils.py for similar issues
cat -n modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py | grep -B 5 -A 5 "trust_remote_code"Repository: NVIDIA/Model-Optimizer
Length of output: 756
🏁 Script executed:
# Let's see the full context of the call in pruning_ckpts.py to understand the impact
rg "init_child_from_parent\(" modelopt/torch/puzzletron/pruning/pruning_ckpts.py -B 2 -A 15 | head -100Repository: NVIDIA/Model-Optimizer
Length of output: 4850
🏁 Script executed:
# Verify that there are no inline comments justifying the hardcoded True
sed -n '126,132p' modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pyRepository: NVIDIA/Model-Optimizer
Length of output: 481
Don't hardcode trust_remote_code=True.
Line 129 turns child-model construction into an RCE boundary for any checkpoint/config that carries custom modeling code. Thread this through as a caller-controlled flag with a safe default of False, and only enable it under a documented exception.
Proposed fix
def init_child_from_parent(
descriptor: ModelDescriptor,
pruning_mixin,
parent_checkpoint_dir: str,
model_config_overrides_dict: dict | str,
output_checkpoint_dir: str,
gqa_init_mode: GQAInitMode,
mlp_init_mode: MlpInitMode,
mlp_init_config_yaml: Optional[str],
linear_init_mode: LinearInitMode,
hidden_size_init_mode: Optional[HiddenSizeInitMode] = None,
channel_importance_path: Optional[str] = None,
max_workers: Optional[int] = None, # Auto-calculate optimal workers if None
max_layer_workers: Optional[int] = None, # Auto-calculate optimal workers if None
+ trust_remote_code: bool = False,
) -> None:
@@
if model_class is AutoModelForCausalLM:
- child_model = model_class.from_config(child_model_config, trust_remote_code=True)
+ child_model = model_class.from_config(
+ child_model_config, trust_remote_code=trust_remote_code
+ )
else:
child_model = model_class._from_config(child_model_config)Per coding guidelines: "Do not hardcode trust_remote_code=True when loading transformers models. Let the caller decide via a parameter with default value False."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`
around lines 123 - 131, The code currently hardcodes trust_remote_code=True when
constructing child models (see AutoModelForCausalLM branch and child_model
creation), which creates an RCE risk; add a caller-controlled boolean parameter
(e.g., trust_remote_code: bool = False) to the function that contains this logic
(init_child_from_parent or the enclosing function), thread that flag through to
the model creation call, and use it instead of the hardcoded True for
model_class.from_config(..., trust_remote_code=trust_remote_code); leave the
default False and document that callers must explicitly opt in for remote code.
| Activation Hooks: | ||
| - activations_log_dir (str, optional): Directory to log activation scores. If provided, | ||
| hooks will be registered to capture activations. | ||
| - activation_hooks_kwargs (str or dict, optional): Arguments for activation hooks. | ||
| If string, comma-separated format: "arg1=val1,arg2=val2". |
There was a problem hiding this comment.
Documented string input currently crashes.
The docstring now says activation_hooks_kwargs may be a comma-separated string, but Line 153 immediately treats it as a mutable dict. Passing the documented string form will raise TypeError. Either parse the legacy string here or narrow the docs to dict only.
Proposed fix
activation_hooks_kwargs = args.activation_hooks_kwargs or {}
+ if isinstance(activation_hooks_kwargs, str):
+ activation_hooks_kwargs = simple_parse_args_string(activation_hooks_kwargs)
activation_hooks_kwargs["validation_full_iters"] = validation_full_iters🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 113 - 117,
The docstring allows activation_hooks_kwargs as a comma-separated string but the
code immediately treats it as a dict causing a TypeError; update the handling in
validate_model.py so activation_hooks_kwargs accepts the legacy string form: if
isinstance(activation_hooks_kwargs, str) parse it into a dict by splitting on
commas and '=' (skip empty entries), strip whitespace, and convert values using
ast.literal_eval (fall back to raw string on ValueError); then proceed as before
(ensure empty/None remains None); reference the activation_hooks_kwargs
parameter and the activation hook registration logic to locate where to add this
parsing.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #990 +/- ##
===================================================
Coverage 72.12% 72.12%
===================================================
Files 209 209
Lines 23628 23628
===================================================
Hits 17042 17042
Misses 6586 6586 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Merging dkorzekwa/anymodel_pruning into dkorzekwa/anymodel_acivation scoring - this MR is only for reviewing. Ultimately dkorzekwa/anymodel_pruning should be merged into feature/puzzletron once dkorzekwa/anymodel_acivation scoring is merged there.
Summary by CodeRabbit
Refactor
Bug Fixes
Documentation