Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-19T14:05:32Z

What does this PR do?

Delete unused decilm code

Summary by CodeRabbit

Release Notes

Removals
- Removed model conversion utilities for Llama-Nemotron format
- Removed DeciLM model classes, tokenizer implementations, and configuration utilities
- Removed checkpoint import/export functionality
- Removed heterogeneous transformer layer specifications and configuration builders
Updates
- Updated pre-commit configuration for additional file exclusions
- Updated imports across modules to reflect removed dependencies

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai · 2026-03-23T08:41:02Z

📝 Walkthrough

Walkthrough

This PR removes a substantial subsystem of DeciLM model classes, Puzzletron-specific conversion utilities, and related export infrastructure spanning configuration, tokenization, model definitions, and checkpoint handling across multiple module directories. Additionally, configuration import statements are updated to reflect these deletions.

Changes

Cohort / File(s)	Summary
Pre-commit Configuration `.pre-commit-config.yaml`	Extended the `insert-license-py` hook's `exclude` regex to skip two additional files: `examples/puzzletron/evaluation/lm_eval_anymodel.py` and `modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py`.
DeciLM Model Architecture `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py`, `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py`, `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py`, `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py`	Deleted HuggingFace model, tokenizer, and tokenization implementations including `DeciLMForCausalLM`, `DeciLMPreTrainedModel`, `DeciLMModel`, classification/QA/token-classification heads, `MegatronTokenizer` base class, `CustomTikTokenizer`, and `MegatronTikTokenizer` tokenizer class.
DeciLM Conversion & Configuration `modelopt/torch/puzzletron/decilm/conversion_utils.py`, `modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py`, `modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py`	Removed utilities for converting model weights and configurations to DeciLM format, including HuggingFace parameter mapping, routed-expert tensor transforms, and Llama3-to-DeciLM config conversion with weight handling.
Puzzletron MCore Export Infrastructure `modelopt/torch/puzzletron/export/MCore/llama_nemotron.py`, `modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py`, `modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py`	Deleted entire MCore integration layer including `PuzzletronLlamaNemotronModel` configuration, HF import/export connectors, heterogeneous transformer layer specs, weight mapping rules, QKV merge/split transforms, and MoE/Mamba support logic.
NeMo Export Conversion Scripts `examples/puzzletron/nemo_export/convert_hf_to_nemo.py`, `examples/puzzletron/nemo_export/convert_nemo_to_hf.py`	Removed CLI scripts for converting between HuggingFace and NeMo checkpoint formats via `nemo.collections.llm` import/export APIs.
Checkpoint & Sharding Utilities `modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`, `modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py`	Deleted `load_checkpoint`, `split_checkpoint_to_subblocks`, `save_safetensors_index`, and `copy_deci_lm_hf_code` functions; removed `DeciLMDummyBlock`, `DeciLMDummyWTE`, `DeciLMDummyLMHead` dummy classes and `create_dummy_model` factory.
Dependency Cleanup `modelopt/torch/puzzletron/replacement_library/replacement_library.py`, `modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`, `modelopt/torch/puzzletron/tools/post_init_sparse.py`, `modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`	Updated imports to remove references to deleted `DeciLMForCausalLM`, `create_dummy_model`, and `copy_deci_lm_hf_code`; removed `save_checkpoint_as_symlinks` and `force_create_symlink` helpers; broadened `do_sparsity` model type annotation from `DeciLMForCausalLM` to `nn.Module`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Pull request introduces critical security violations: hardcoded trust_remote_code=True without caller configurability in validate_puzzle_with_multi_replacements.py, and unsafe torch.load() without weights_only=True in replacement_library.py, contradicting SECURITY.md guidelines.	Remove hardcoded trust_remote_code=True and expose as configurable parameter defaulting to False; add weights_only=True to torch.load() call or provide justifying inline comment per SECURITY.md requirements.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Dkorzekwa/decilm hf code cleanup' accurately describes the primary objective of deleting unused DeciLM-related HuggingFace code across multiple files and modules.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dkorzekwa/decilm_hf_code_cleanup

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (2)

175-193: ⚠️ Potential issue | 🔴 Critical

model can be uninitialized before checkpoint save (runtime crash).

When args.save_models=True, args.skip_validation=True, and realizable_as_symlinks=True, model is never set in Lines 175-177, but is used at Line 192. This can raise UnboundLocalError on the first such solution.

💡 Proposed fix

-        if (args.save_models and not realizable_as_symlinks) or (not args.skip_validation):
+        if args.save_models or (not args.skip_validation):
             model = replacement_library.load_model(layer_replacements)
             model_config = model.config

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
around lines 175 - 193, The code can call save_checkpoint(model, ...) when model
was never created; ensure model is initialized whenever args.save_models is
true. Fix by calling replacement_library.load_model(layer_replacements) (and
reading model.config) before any checkpoint logic when args.save_models is true
(regardless of args.skip_validation or realizable_as_symlinks), or alternatively
guard save_checkpoint so it only runs if model is defined; update the block
around replacement_library.load_model, model_config and save_checkpoint (and
where Converter.copy_checkpoint_files is used) so model is always available for
save_checkpoint.

232-237: ⚠️ Potential issue | 🔴 Critical

Remove hardcoded trust_remote_code=True from tokenizer loading.

Lines 233 and 236 hardcode trust_remote_code=True, enabling execution of arbitrary remote model code without caller control. Per security guidelines, this must be caller-configurable and default to False.

🔒 Proposed fix

 def _load_tokenizer(args: DictConfig) -> PreTrainedTokenizerBase:
     tokenizer = None
+    trust_remote_code = bool(getattr(args, "trust_remote_code", False))
     if (tokenizer_name := getattr(args, "tokenizer_name", None)) is not None:
-        tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)
+        tokenizer = AutoTokenizer.from_pretrained(
+            tokenizer_name, trust_remote_code=trust_remote_code
+        )
     elif args.teacher_dir is not None:
         try:
-            tokenizer = AutoTokenizer.from_pretrained(args.teacher_dir, trust_remote_code=True)
-        except:
+            tokenizer = AutoTokenizer.from_pretrained(
+                args.teacher_dir, trust_remote_code=trust_remote_code
+            )
+        except Exception:
             pass

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
around lines 232 - 237, The tokenizer loading currently hardcodes
trust_remote_code=True in AutoTokenizer.from_pretrained calls (see
tokenizer_name and args.teacher_dir branches); make trust_remote_code a
caller-configurable boolean (e.g., add or use args.trust_remote_code with
default False) and pass that variable into both AutoTokenizer.from_pretrained
invocations instead of the hardcoded True so callers opt-in to remote code
execution; ensure the default remains False and both tokenizer loads
(tokenizer_name branch and teacher_dir branch) use the same
args.trust_remote_code flag.

🧹 Nitpick comments (3)

.pre-commit-config.yaml (1)
116-116: Remove redundant exclude entry.

Line [116] duplicates the same path already present on Line [111], so it has no effect and makes the regex harder to maintain.
Proposed cleanup
-              examples/puzzletron/evaluation/lm_eval_anymodel.py|
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml at line 116, The .pre-commit-config.yaml contains a
duplicated exclude entry "examples/puzzletron/evaluation/lm_eval_anymodel.py"
(appears twice); remove the redundant occurrence so the exclude list only
contains a single entry for that path—locate the duplicate exclude string in the
exclude block and delete the second instance to keep the regex list minimal and
maintainable.
modelopt/torch/puzzletron/tools/post_init_sparse.py (1)
62-67: Consider using a Protocol type hint for better type safety.

The nn.Module type annotation is overly broad since do_sparsity relies on model.config.block_configs and model.model.layers attributes that aren't part of the nn.Module interface. This works via duck typing but provides no static type checking benefit.

This is acceptable for this cleanup PR, but consider defining a Protocol if this pattern is used elsewhere:
class SparsifiableModel(Protocol):
    config: Any  # with block_configs
    model: Any   # with layers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/post_init_sparse.py` around lines 62 - 67,
The do_sparsity method currently types its parameter as nn.Module but uses
attributes like model.config.block_configs and model.model.layers that are not
part of nn.Module; define a lightweight Protocol (e.g., SparsifiableModel)
declaring the needed members (config with block_configs and model with layers)
and update the do_sparsity signature to accept that Protocol instead of
nn.Module so static type checkers can validate usage of
model.config.block_configs and model.model.layers.
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)
60-62: Remove unused _CONFIG_FOR_DOC constant at line 62.

This constant was used for HuggingFace docstrings in the removed model wrapper classes (e.g., DeciLMForCausalLM). With those classes deleted, it is no longer referenced anywhere in the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py` around
lines 60 - 62, Remove the now-unused constant _CONFIG_FOR_DOC from
modeling_decilm.py (it was only used by the deleted HF wrapper classes like
DeciLMForCausalLM); locate the declaration "_CONFIG_FOR_DOC = \"DeciLMConfig\""
near the top of the file and delete that line so the module no longer defines an
unused symbol.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Around line 175-193: The code can call save_checkpoint(model, ...) when model
was never created; ensure model is initialized whenever args.save_models is
true. Fix by calling replacement_library.load_model(layer_replacements) (and
reading model.config) before any checkpoint logic when args.save_models is true
(regardless of args.skip_validation or realizable_as_symlinks), or alternatively
guard save_checkpoint so it only runs if model is defined; update the block
around replacement_library.load_model, model_config and save_checkpoint (and
where Converter.copy_checkpoint_files is used) so model is always available for
save_checkpoint.
- Around line 232-237: The tokenizer loading currently hardcodes
trust_remote_code=True in AutoTokenizer.from_pretrained calls (see
tokenizer_name and args.teacher_dir branches); make trust_remote_code a
caller-configurable boolean (e.g., add or use args.trust_remote_code with
default False) and pass that variable into both AutoTokenizer.from_pretrained
invocations instead of the hardcoded True so callers opt-in to remote code
execution; ensure the default remains False and both tokenizer loads
(tokenizer_name branch and teacher_dir branch) use the same
args.trust_remote_code flag.

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Line 116: The .pre-commit-config.yaml contains a duplicated exclude entry
"examples/puzzletron/evaluation/lm_eval_anymodel.py" (appears twice); remove the
redundant occurrence so the exclude list only contains a single entry for that
path—locate the duplicate exclude string in the exclude block and delete the
second instance to keep the regex list minimal and maintainable.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py`:
- Around line 60-62: Remove the now-unused constant _CONFIG_FOR_DOC from
modeling_decilm.py (it was only used by the deleted HF wrapper classes like
DeciLMForCausalLM); locate the declaration "_CONFIG_FOR_DOC = \"DeciLMConfig\""
near the top of the file and delete that line so the module no longer defines an
unused symbol.

In `@modelopt/torch/puzzletron/tools/post_init_sparse.py`:
- Around line 62-67: The do_sparsity method currently types its parameter as
nn.Module but uses attributes like model.config.block_configs and
model.model.layers that are not part of nn.Module; define a lightweight Protocol
(e.g., SparsifiableModel) declaring the needed members (config with
block_configs and model with layers) and update the do_sparsity signature to
accept that Protocol instead of nn.Module so static type checkers can validate
usage of model.config.block_configs and model.model.layers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2ceb6c52-656d-46ee-a74d-9e46b24f9fe0

📥 Commits

Reviewing files that changed from the base of the PR and between 2b6572c and 65c4b9f.

📒 Files selected for processing (19)

.pre-commit-config.yaml
examples/puzzletron/nemo_export/convert_hf_to_nemo.py
examples/puzzletron/nemo_export/convert_nemo_to_hf.py
modelopt/torch/puzzletron/decilm/conversion_utils.py
modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py
modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py
modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
modelopt/torch/puzzletron/replacement_library/replacement_library.py
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
modelopt/torch/puzzletron/tools/post_init_sparse.py
modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py

💤 Files with no reviewable changes (14)

modelopt/torch/puzzletron/replacement_library/replacement_library.py
examples/puzzletron/nemo_export/convert_hf_to_nemo.py
examples/puzzletron/nemo_export/convert_nemo_to_hf.py
modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
modelopt/torch/puzzletron/decilm/conversion_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py

codecov · 2026-03-23T08:52:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (2b6572c) to head (65c4b9f).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           feature/puzzletron    #1071      +/-   ##
======================================================
- Coverage               72.13%   72.12%   -0.02%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
- Hits                    17045    17042       -3     
- Misses                   6583     6586       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

danielkorzekwa added 30 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 14 commits March 17, 2026 10:35

Delete not used decilm dummy blocks and create_dummy_model()

67444f4

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used decilm converters

944f6f9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used decilm code

7ee045a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

removing decilm not used code.

cd1bf88

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove dead decilm code

e2fa0b3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete megatron_lm_tokenizer

fb48618

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete nemo export/import for decilm version of puzzletron

5297a1c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete dead code.

cbba0b0

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete DeciLMForCausalLM

e0fb3c1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove unused save_checkpoint_as_symlinks()

dbaab53

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code clean up

9c943fd

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove megatron_tokenizer

098d7c1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete copy_deci_lm_hf_code

5d0efa1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete DeciLMPreTrainModel and DeciLMModel

ead68bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa requested review from a team as code owners March 19, 2026 14:05

danielkorzekwa requested review from cjluo-nv and removed request for a team March 19, 2026 14:05

kevalmorabia97 approved these changes Mar 19, 2026

View reviewed changes

Base automatically changed from dkorzekwa/remainings_from_dkorzekwa_anymodel_merging_process to feature/puzzletron March 20, 2026 14:23

danielkorzekwa requested a review from a team as a code owner March 20, 2026 14:23

kevalmorabia97 removed the request for review from cjluo-nv March 20, 2026 17:20

Merge branch 'feature/puzzletron' into dkorzekwa/decilm_hf_code_cleanup

65c4b9f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

danielkorzekwa merged commit 110316a into feature/puzzletron Mar 23, 2026
28 checks passed

danielkorzekwa deleted the dkorzekwa/decilm_hf_code_cleanup branch March 23, 2026 13:58

coderabbitai bot mentioned this pull request Mar 24, 2026

Dkorzekwa/decilm cleanup post subblockstats #1103

Merged

danielkorzekwa mentioned this pull request Mar 25, 2026

Merge puzzletron compression algorithm #1121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dkorzekwa/decilm hf code cleanup#1071

Dkorzekwa/decilm hf code cleanup#1071
danielkorzekwa merged 109 commits intofeature/puzzletronfrom
dkorzekwa/decilm_hf_code_cleanup

danielkorzekwa commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Pre-merge checks failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielkorzekwa commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielkorzekwa commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading