Skip to content

Dkorzekwa/decilm hf code cleanup#1071

Merged
danielkorzekwa merged 109 commits intofeature/puzzletronfrom
dkorzekwa/decilm_hf_code_cleanup
Mar 23, 2026
Merged

Dkorzekwa/decilm hf code cleanup#1071
danielkorzekwa merged 109 commits intofeature/puzzletronfrom
dkorzekwa/decilm_hf_code_cleanup

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 19, 2026

What does this PR do?

  • Delete unused decilm code

Summary by CodeRabbit

Release Notes

  • Removals

    • Removed model conversion utilities for Llama-Nemotron format
    • Removed DeciLM model classes, tokenizer implementations, and configuration utilities
    • Removed checkpoint import/export functionality
    • Removed heterogeneous transformer layer specifications and configuration builders
  • Updates

    • Updated pre-commit configuration for additional file exclusions
    • Updated imports across modules to reflect removed dependencies

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested review from a team as code owners March 19, 2026 14:05
@danielkorzekwa danielkorzekwa requested review from cjluo-nv and removed request for a team March 19, 2026 14:05
Base automatically changed from dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process to feature/puzzletron March 20, 2026 14:23
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 20, 2026 14:23
@kevalmorabia97 kevalmorabia97 removed the request for review from cjluo-nv March 20, 2026 17:20
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This PR removes a substantial subsystem of DeciLM model classes, Puzzletron-specific conversion utilities, and related export infrastructure spanning configuration, tokenization, model definitions, and checkpoint handling across multiple module directories. Additionally, configuration import statements are updated to reflect these deletions.

Changes

Cohort / File(s) Summary
Pre-commit Configuration
.pre-commit-config.yaml
Extended the insert-license-py hook's exclude regex to skip two additional files: examples/puzzletron/evaluation/lm_eval_anymodel.py and modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py.
DeciLM Model Architecture
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py, modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py, modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py, modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
Deleted HuggingFace model, tokenizer, and tokenization implementations including DeciLMForCausalLM, DeciLMPreTrainedModel, DeciLMModel, classification/QA/token-classification heads, MegatronTokenizer base class, CustomTikTokenizer, and MegatronTikTokenizer tokenizer class.
DeciLM Conversion & Configuration
modelopt/torch/puzzletron/decilm/conversion_utils.py, modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py, modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
Removed utilities for converting model weights and configurations to DeciLM format, including HuggingFace parameter mapping, routed-expert tensor transforms, and Llama3-to-DeciLM config conversion with weight handling.
Puzzletron MCore Export Infrastructure
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py, modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py, modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
Deleted entire MCore integration layer including PuzzletronLlamaNemotronModel configuration, HF import/export connectors, heterogeneous transformer layer specs, weight mapping rules, QKV merge/split transforms, and MoE/Mamba support logic.
NeMo Export Conversion Scripts
examples/puzzletron/nemo_export/convert_hf_to_nemo.py, examples/puzzletron/nemo_export/convert_nemo_to_hf.py
Removed CLI scripts for converting between HuggingFace and NeMo checkpoint formats via nemo.collections.llm import/export APIs.
Checkpoint & Sharding Utilities
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py, modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
Deleted load_checkpoint, split_checkpoint_to_subblocks, save_safetensors_index, and copy_deci_lm_hf_code functions; removed DeciLMDummyBlock, DeciLMDummyWTE, DeciLMDummyLMHead dummy classes and create_dummy_model factory.
Dependency Cleanup
modelopt/torch/puzzletron/replacement_library/replacement_library.py, modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py, modelopt/torch/puzzletron/tools/post_init_sparse.py, modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
Updated imports to remove references to deleted DeciLMForCausalLM, create_dummy_model, and copy_deci_lm_hf_code; removed save_checkpoint_as_symlinks and force_create_symlink helpers; broadened do_sparsity model type annotation from DeciLMForCausalLM to nn.Module.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Pull request introduces critical security violations: hardcoded trust_remote_code=True without caller configurability in validate_puzzle_with_multi_replacements.py, and unsafe torch.load() without weights_only=True in replacement_library.py, contradicting SECURITY.md guidelines. Remove hardcoded trust_remote_code=True and expose as configurable parameter defaulting to False; add weights_only=True to torch.load() call or provide justifying inline comment per SECURITY.md requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Dkorzekwa/decilm hf code cleanup' accurately describes the primary objective of deleting unused DeciLM-related HuggingFace code across multiple files and modules.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dkorzekwa/decilm_hf_code_cleanup

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (2)

175-193: ⚠️ Potential issue | 🔴 Critical

model can be uninitialized before checkpoint save (runtime crash).

When args.save_models=True, args.skip_validation=True, and realizable_as_symlinks=True, model is never set in Lines 175-177, but is used at Line 192. This can raise UnboundLocalError on the first such solution.

💡 Proposed fix
-        if (args.save_models and not realizable_as_symlinks) or (not args.skip_validation):
+        if args.save_models or (not args.skip_validation):
             model = replacement_library.load_model(layer_replacements)
             model_config = model.config
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
around lines 175 - 193, The code can call save_checkpoint(model, ...) when model
was never created; ensure model is initialized whenever args.save_models is
true. Fix by calling replacement_library.load_model(layer_replacements) (and
reading model.config) before any checkpoint logic when args.save_models is true
(regardless of args.skip_validation or realizable_as_symlinks), or alternatively
guard save_checkpoint so it only runs if model is defined; update the block
around replacement_library.load_model, model_config and save_checkpoint (and
where Converter.copy_checkpoint_files is used) so model is always available for
save_checkpoint.

232-237: ⚠️ Potential issue | 🔴 Critical

Remove hardcoded trust_remote_code=True from tokenizer loading.

Lines 233 and 236 hardcode trust_remote_code=True, enabling execution of arbitrary remote model code without caller control. Per security guidelines, this must be caller-configurable and default to False.

🔒 Proposed fix
 def _load_tokenizer(args: DictConfig) -> PreTrainedTokenizerBase:
     tokenizer = None
+    trust_remote_code = bool(getattr(args, "trust_remote_code", False))
     if (tokenizer_name := getattr(args, "tokenizer_name", None)) is not None:
-        tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)
+        tokenizer = AutoTokenizer.from_pretrained(
+            tokenizer_name, trust_remote_code=trust_remote_code
+        )
     elif args.teacher_dir is not None:
         try:
-            tokenizer = AutoTokenizer.from_pretrained(args.teacher_dir, trust_remote_code=True)
-        except:
+            tokenizer = AutoTokenizer.from_pretrained(
+                args.teacher_dir, trust_remote_code=trust_remote_code
+            )
+        except Exception:
             pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
around lines 232 - 237, The tokenizer loading currently hardcodes
trust_remote_code=True in AutoTokenizer.from_pretrained calls (see
tokenizer_name and args.teacher_dir branches); make trust_remote_code a
caller-configurable boolean (e.g., add or use args.trust_remote_code with
default False) and pass that variable into both AutoTokenizer.from_pretrained
invocations instead of the hardcoded True so callers opt-in to remote code
execution; ensure the default remains False and both tokenizer loads
(tokenizer_name branch and teacher_dir branch) use the same
args.trust_remote_code flag.
🧹 Nitpick comments (3)
.pre-commit-config.yaml (1)

116-116: Remove redundant exclude entry.

Line [116] duplicates the same path already present on Line [111], so it has no effect and makes the regex harder to maintain.

Proposed cleanup
-              examples/puzzletron/evaluation/lm_eval_anymodel.py|
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml at line 116, The .pre-commit-config.yaml contains a
duplicated exclude entry "examples/puzzletron/evaluation/lm_eval_anymodel.py"
(appears twice); remove the redundant occurrence so the exclude list only
contains a single entry for that path—locate the duplicate exclude string in the
exclude block and delete the second instance to keep the regex list minimal and
maintainable.
modelopt/torch/puzzletron/tools/post_init_sparse.py (1)

62-67: Consider using a Protocol type hint for better type safety.

The nn.Module type annotation is overly broad since do_sparsity relies on model.config.block_configs and model.model.layers attributes that aren't part of the nn.Module interface. This works via duck typing but provides no static type checking benefit.

This is acceptable for this cleanup PR, but consider defining a Protocol if this pattern is used elsewhere:

class SparsifiableModel(Protocol):
    config: Any  # with block_configs
    model: Any   # with layers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/post_init_sparse.py` around lines 62 - 67,
The do_sparsity method currently types its parameter as nn.Module but uses
attributes like model.config.block_configs and model.model.layers that are not
part of nn.Module; define a lightweight Protocol (e.g., SparsifiableModel)
declaring the needed members (config with block_configs and model with layers)
and update the do_sparsity signature to accept that Protocol instead of
nn.Module so static type checkers can validate usage of
model.config.block_configs and model.model.layers.
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)

60-62: Remove unused _CONFIG_FOR_DOC constant at line 62.

This constant was used for HuggingFace docstrings in the removed model wrapper classes (e.g., DeciLMForCausalLM). With those classes deleted, it is no longer referenced anywhere in the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py` around
lines 60 - 62, Remove the now-unused constant _CONFIG_FOR_DOC from
modeling_decilm.py (it was only used by the deleted HF wrapper classes like
DeciLMForCausalLM); locate the declaration "_CONFIG_FOR_DOC = \"DeciLMConfig\""
near the top of the file and delete that line so the module no longer defines an
unused symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Around line 175-193: The code can call save_checkpoint(model, ...) when model
was never created; ensure model is initialized whenever args.save_models is
true. Fix by calling replacement_library.load_model(layer_replacements) (and
reading model.config) before any checkpoint logic when args.save_models is true
(regardless of args.skip_validation or realizable_as_symlinks), or alternatively
guard save_checkpoint so it only runs if model is defined; update the block
around replacement_library.load_model, model_config and save_checkpoint (and
where Converter.copy_checkpoint_files is used) so model is always available for
save_checkpoint.
- Around line 232-237: The tokenizer loading currently hardcodes
trust_remote_code=True in AutoTokenizer.from_pretrained calls (see
tokenizer_name and args.teacher_dir branches); make trust_remote_code a
caller-configurable boolean (e.g., add or use args.trust_remote_code with
default False) and pass that variable into both AutoTokenizer.from_pretrained
invocations instead of the hardcoded True so callers opt-in to remote code
execution; ensure the default remains False and both tokenizer loads
(tokenizer_name branch and teacher_dir branch) use the same
args.trust_remote_code flag.

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Line 116: The .pre-commit-config.yaml contains a duplicated exclude entry
"examples/puzzletron/evaluation/lm_eval_anymodel.py" (appears twice); remove the
redundant occurrence so the exclude list only contains a single entry for that
path—locate the duplicate exclude string in the exclude block and delete the
second instance to keep the regex list minimal and maintainable.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py`:
- Around line 60-62: Remove the now-unused constant _CONFIG_FOR_DOC from
modeling_decilm.py (it was only used by the deleted HF wrapper classes like
DeciLMForCausalLM); locate the declaration "_CONFIG_FOR_DOC = \"DeciLMConfig\""
near the top of the file and delete that line so the module no longer defines an
unused symbol.

In `@modelopt/torch/puzzletron/tools/post_init_sparse.py`:
- Around line 62-67: The do_sparsity method currently types its parameter as
nn.Module but uses attributes like model.config.block_configs and
model.model.layers that are not part of nn.Module; define a lightweight Protocol
(e.g., SparsifiableModel) declaring the needed members (config with
block_configs and model with layers) and update the do_sparsity signature to
accept that Protocol instead of nn.Module so static type checkers can validate
usage of model.config.block_configs and model.model.layers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2ceb6c52-656d-46ee-a74d-9e46b24f9fe0

📥 Commits

Reviewing files that changed from the base of the PR and between 2b6572c and 65c4b9f.

📒 Files selected for processing (19)
  • .pre-commit-config.yaml
  • examples/puzzletron/nemo_export/convert_hf_to_nemo.py
  • examples/puzzletron/nemo_export/convert_nemo_to_hf.py
  • modelopt/torch/puzzletron/decilm/conversion_utils.py
  • modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
  • modelopt/torch/puzzletron/export/MCore/llama_nemotron.py
  • modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py
  • modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
  • modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
  • modelopt/torch/puzzletron/replacement_library/replacement_library.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/tools/post_init_sparse.py
  • modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
💤 Files with no reviewable changes (14)
  • modelopt/torch/puzzletron/replacement_library/replacement_library.py
  • examples/puzzletron/nemo_export/convert_hf_to_nemo.py
  • examples/puzzletron/nemo_export/convert_nemo_to_hf.py
  • modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py
  • modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
  • modelopt/torch/puzzletron/decilm/conversion_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
  • modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
  • modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py
  • modelopt/torch/puzzletron/export/MCore/llama_nemotron.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (2b6572c) to head (65c4b9f).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1071      +/-   ##
======================================================
- Coverage               72.13%   72.12%   -0.02%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
- Hits                    17045    17042       -3     
- Misses                   6583     6586       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danielkorzekwa danielkorzekwa merged commit 110316a into feature/puzzletron Mar 23, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/decilm_hf_code_cleanup branch March 23, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants