Dkorzekwa/decilm hf code cleanup#1071
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
📝 WalkthroughWalkthroughThis PR removes a substantial subsystem of DeciLM model classes, Puzzletron-specific conversion utilities, and related export infrastructure spanning configuration, tokenization, model definitions, and checkpoint handling across multiple module directories. Additionally, configuration import statements are updated to reflect these deletions. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (3 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (2)
175-193:⚠️ Potential issue | 🔴 Critical
modelcan be uninitialized before checkpoint save (runtime crash).When
args.save_models=True,args.skip_validation=True, andrealizable_as_symlinks=True,modelis never set in Lines 175-177, but is used at Line 192. This can raiseUnboundLocalErroron the first such solution.💡 Proposed fix
- if (args.save_models and not realizable_as_symlinks) or (not args.skip_validation): + if args.save_models or (not args.skip_validation): model = replacement_library.load_model(layer_replacements) model_config = model.config🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py` around lines 175 - 193, The code can call save_checkpoint(model, ...) when model was never created; ensure model is initialized whenever args.save_models is true. Fix by calling replacement_library.load_model(layer_replacements) (and reading model.config) before any checkpoint logic when args.save_models is true (regardless of args.skip_validation or realizable_as_symlinks), or alternatively guard save_checkpoint so it only runs if model is defined; update the block around replacement_library.load_model, model_config and save_checkpoint (and where Converter.copy_checkpoint_files is used) so model is always available for save_checkpoint.
232-237:⚠️ Potential issue | 🔴 CriticalRemove hardcoded
trust_remote_code=Truefrom tokenizer loading.Lines 233 and 236 hardcode
trust_remote_code=True, enabling execution of arbitrary remote model code without caller control. Per security guidelines, this must be caller-configurable and default toFalse.🔒 Proposed fix
def _load_tokenizer(args: DictConfig) -> PreTrainedTokenizerBase: tokenizer = None + trust_remote_code = bool(getattr(args, "trust_remote_code", False)) if (tokenizer_name := getattr(args, "tokenizer_name", None)) is not None: - tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True) + tokenizer = AutoTokenizer.from_pretrained( + tokenizer_name, trust_remote_code=trust_remote_code + ) elif args.teacher_dir is not None: try: - tokenizer = AutoTokenizer.from_pretrained(args.teacher_dir, trust_remote_code=True) - except: + tokenizer = AutoTokenizer.from_pretrained( + args.teacher_dir, trust_remote_code=trust_remote_code + ) + except Exception: pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py` around lines 232 - 237, The tokenizer loading currently hardcodes trust_remote_code=True in AutoTokenizer.from_pretrained calls (see tokenizer_name and args.teacher_dir branches); make trust_remote_code a caller-configurable boolean (e.g., add or use args.trust_remote_code with default False) and pass that variable into both AutoTokenizer.from_pretrained invocations instead of the hardcoded True so callers opt-in to remote code execution; ensure the default remains False and both tokenizer loads (tokenizer_name branch and teacher_dir branch) use the same args.trust_remote_code flag.
🧹 Nitpick comments (3)
.pre-commit-config.yaml (1)
116-116: Remove redundant exclude entry.Line [116] duplicates the same path already present on Line [111], so it has no effect and makes the regex harder to maintain.
Proposed cleanup
- examples/puzzletron/evaluation/lm_eval_anymodel.py|🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.pre-commit-config.yaml at line 116, The .pre-commit-config.yaml contains a duplicated exclude entry "examples/puzzletron/evaluation/lm_eval_anymodel.py" (appears twice); remove the redundant occurrence so the exclude list only contains a single entry for that path—locate the duplicate exclude string in the exclude block and delete the second instance to keep the regex list minimal and maintainable.modelopt/torch/puzzletron/tools/post_init_sparse.py (1)
62-67: Consider using a Protocol type hint for better type safety.The
nn.Moduletype annotation is overly broad sincedo_sparsityrelies onmodel.config.block_configsandmodel.model.layersattributes that aren't part of thenn.Moduleinterface. This works via duck typing but provides no static type checking benefit.This is acceptable for this cleanup PR, but consider defining a Protocol if this pattern is used elsewhere:
class SparsifiableModel(Protocol): config: Any # with block_configs model: Any # with layers🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/post_init_sparse.py` around lines 62 - 67, The do_sparsity method currently types its parameter as nn.Module but uses attributes like model.config.block_configs and model.model.layers that are not part of nn.Module; define a lightweight Protocol (e.g., SparsifiableModel) declaring the needed members (config with block_configs and model with layers) and update the do_sparsity signature to accept that Protocol instead of nn.Module so static type checkers can validate usage of model.config.block_configs and model.model.layers.modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)
60-62: Remove unused_CONFIG_FOR_DOCconstant at line 62.This constant was used for HuggingFace docstrings in the removed model wrapper classes (e.g.,
DeciLMForCausalLM). With those classes deleted, it is no longer referenced anywhere in the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py` around lines 60 - 62, Remove the now-unused constant _CONFIG_FOR_DOC from modeling_decilm.py (it was only used by the deleted HF wrapper classes like DeciLMForCausalLM); locate the declaration "_CONFIG_FOR_DOC = \"DeciLMConfig\"" near the top of the file and delete that line so the module no longer defines an unused symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Around line 175-193: The code can call save_checkpoint(model, ...) when model
was never created; ensure model is initialized whenever args.save_models is
true. Fix by calling replacement_library.load_model(layer_replacements) (and
reading model.config) before any checkpoint logic when args.save_models is true
(regardless of args.skip_validation or realizable_as_symlinks), or alternatively
guard save_checkpoint so it only runs if model is defined; update the block
around replacement_library.load_model, model_config and save_checkpoint (and
where Converter.copy_checkpoint_files is used) so model is always available for
save_checkpoint.
- Around line 232-237: The tokenizer loading currently hardcodes
trust_remote_code=True in AutoTokenizer.from_pretrained calls (see
tokenizer_name and args.teacher_dir branches); make trust_remote_code a
caller-configurable boolean (e.g., add or use args.trust_remote_code with
default False) and pass that variable into both AutoTokenizer.from_pretrained
invocations instead of the hardcoded True so callers opt-in to remote code
execution; ensure the default remains False and both tokenizer loads
(tokenizer_name branch and teacher_dir branch) use the same
args.trust_remote_code flag.
---
Nitpick comments:
In @.pre-commit-config.yaml:
- Line 116: The .pre-commit-config.yaml contains a duplicated exclude entry
"examples/puzzletron/evaluation/lm_eval_anymodel.py" (appears twice); remove the
redundant occurrence so the exclude list only contains a single entry for that
path—locate the duplicate exclude string in the exclude block and delete the
second instance to keep the regex list minimal and maintainable.
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py`:
- Around line 60-62: Remove the now-unused constant _CONFIG_FOR_DOC from
modeling_decilm.py (it was only used by the deleted HF wrapper classes like
DeciLMForCausalLM); locate the declaration "_CONFIG_FOR_DOC = \"DeciLMConfig\""
near the top of the file and delete that line so the module no longer defines an
unused symbol.
In `@modelopt/torch/puzzletron/tools/post_init_sparse.py`:
- Around line 62-67: The do_sparsity method currently types its parameter as
nn.Module but uses attributes like model.config.block_configs and
model.model.layers that are not part of nn.Module; define a lightweight Protocol
(e.g., SparsifiableModel) declaring the needed members (config with
block_configs and model with layers) and update the do_sparsity signature to
accept that Protocol instead of nn.Module so static type checkers can validate
usage of model.config.block_configs and model.model.layers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2ceb6c52-656d-46ee-a74d-9e46b24f9fe0
📒 Files selected for processing (19)
.pre-commit-config.yamlexamples/puzzletron/nemo_export/convert_hf_to_nemo.pyexamples/puzzletron/nemo_export/convert_nemo_to_hf.pymodelopt/torch/puzzletron/decilm/conversion_utils.pymodelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.pymodelopt/torch/puzzletron/export/MCore/llama_nemotron.pymodelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.pymodelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.pymodelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.pymodelopt/torch/puzzletron/replacement_library/replacement_library.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pymodelopt/torch/puzzletron/tools/checkpoint_utils_hf.pymodelopt/torch/puzzletron/tools/post_init_sparse.pymodelopt/torch/puzzletron/tools/sharded_checkpoint_utils.pymodelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
💤 Files with no reviewable changes (14)
- modelopt/torch/puzzletron/replacement_library/replacement_library.py
- examples/puzzletron/nemo_export/convert_hf_to_nemo.py
- examples/puzzletron/nemo_export/convert_nemo_to_hf.py
- modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py
- modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
- modelopt/torch/puzzletron/decilm/conversion_utils.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py
- modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
- modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
- modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
- modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py
- modelopt/torch/puzzletron/export/MCore/llama_nemotron.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1071 +/- ##
======================================================
- Coverage 72.13% 72.12% -0.02%
======================================================
Files 209 209
Lines 23628 23628
======================================================
- Hits 17045 17042 -3
- Misses 6583 6586 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Summary by CodeRabbit
Release Notes
Removals
Updates