Skip to content

Dkorzekwa/decilm hf code cleanup 2#1073

Merged
danielkorzekwa merged 116 commits intofeature/puzzletronfrom
dkorzekwa/decilm_hf_code_cleanup_2
Mar 23, 2026
Merged

Dkorzekwa/decilm hf code cleanup 2#1073
danielkorzekwa merged 116 commits intofeature/puzzletronfrom
dkorzekwa/decilm_hf_code_cleanup_2

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 19, 2026

What does this PR do?

Delete not used decilm code

Summary by CodeRabbit

  • Refactor
    • Removed DeciLM-specific components including decoder layers, attention implementations, and specialized cache utilities, streamlining the codebase
    • Updated replacement library to use generic model configurations instead of DeciLM-specific types, improving compatibility with diverse architectures
    • Cleaned up internal utilities for attention masking, flash attention compatibility, and rotary position embeddings

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 19, 2026 17:30
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

The PR removes extensive DeciLM-specific model components (attention mechanisms, RoPE embeddings, Mamba mixer, decoder layers) across multiple utility modules and refactors the replacement library to adopt a generic AnyModel loading approach instead of DeciLM-specific layer replacement discovery and caching.

Changes

Cohort / File(s) Summary
DeciLM Core Model Components
configuration_decilm.py, megatron_lm__mamba_mixer.py, modeling_decilm.py
Removed fake import hook for HybridChunkedCache, deleted entire MambaMixerMegatron implementation (527 lines), and stripped modeling_decilm.py of HF/transformers-derived decoder stack including DeciLMAttention, DeciLMFlashAttention2, rotary embedding classes, decoder layers, causal mask utilities, and vanilla MLP; retained only DeciLMRMSNorm, DeciLMGatedMLP, DeciLMMoe, and LMHead.
Transformers Utility Modules
transformers_4_44_2__modeling_attn_mask_utils.py, transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py, transformers_4_44_2__pytorch_utils.py, transformers_4_51_3__modeling_llama4_attention.py
Deleted four utility modules: mask construction helpers (498 lines), FlashAttention backward-compatibility utilities (363 lines), layernorm registry (32 lines), and Llama4 text attention with RoPE support (289 lines).
Inference Cache & Embeddings
variable_cache.py, vllm_yarn_utils.py
Removed VariableCache class for per-layer KV-cache management (213 lines) and YaRN scaling rotary embedding module with position embedding logic (210 lines).
Replacement Library Refactoring
replacement_library.py, replacement_utils.py
Refactored replacement library from DeciLM-specific checkpoint loading with layer caching (load_checkpoint() method, dtype/layer accessors, block getters) to generic load_and_shard_model-based AnyModel approach; renamed internal method _get_arbitrary_block_checkpoint_paths() to _get_arbitrary_non_block_checkpoint_paths(). Updated function signatures in replacement_utils.py to use PretrainedConfig instead of DeciLM-specific DeciLMConfig.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Dkorzekwa/decilm hf code cleanup 2' partially relates to the changeset by referencing a cleanup effort, but it is vague and generic, lacking specificity about what was actually removed or changed. Revise the title to be more specific about the primary change, such as 'Remove unused DeciLM HuggingFace utilities and replacement library code' or similar descriptive phrasing that clearly indicates the main cleanup scope.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns ✅ Passed The PR does not introduce critical security anti-patterns specified in SECURITY.md. No unsafe torch.load/numpy.load/trust_remote_code/eval/exec patterns are added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dkorzekwa/decilm_hf_code_cleanup_2

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Base automatically changed from dkorzekwa/decilm_hf_code_cleanup to feature/puzzletron March 23, 2026 13:58
@danielkorzekwa danielkorzekwa requested review from a team as code owners March 23, 2026 13:58
…up_2

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)

159-163: ⚠️ Potential issue | 🟡 Minor

Missing return value when no weight paths are found.

_get_arbitrary_checkpoint_dir implicitly returns None if all layer_replacement["weight_paths"] are empty. The caller get_arbitrary_checkpoint_dir at line 156 would cache None, and subsequent uses (e.g., line 83, 98, 119) would fail when treating it as a Path. Consider raising an explicit error or returning a sentinel.

🐛 Proposed fix to raise explicit error
     def _get_arbitrary_checkpoint_dir(self) -> Path:
         for layer_replacement in self.replacement_library:
             weight_paths = layer_replacement["weight_paths"]
             if len(weight_paths) > 0:
                 return weights_path_to_checkpoint_dir(weight_paths[0])
+        raise ValueError("No checkpoint directory found: all layer replacements have empty weight_paths")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around
lines 159 - 163, The helper _get_arbitrary_checkpoint_dir currently falls
through and returns None when every layer_replacement["weight_paths"] is empty,
which causes get_arbitrary_checkpoint_dir to cache None and later code to treat
it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a
descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths
are found or alternatively return a clear sentinel Path, so callers like
get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally
receive None; locate the function _get_arbitrary_checkpoint_dir in
replacement_library and add the explicit error path after iterating
layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir
for valid returns.
modelopt/torch/puzzletron/replacement_library/replacement_utils.py (1)

88-108: ⚠️ Potential issue | 🟡 Minor

Add a guard to check if block_configs exists before accessing it.

Line 94 directly accesses teacher_model_config.block_configs[block_idx] without verifying the attribute exists. While load_model_config() (which creates the config passed to this function) uses hasattr(config, "block_configs") before processing, it does not guarantee the attribute is present on all configs. The codebase shows this is a real concern — comments note that vision-language models can have nested configs without block_configs at the top level, and multiple files use defensive hasattr checks (e.g., utils/parsing.py, checkpoint_utils_hf.py). This will raise AttributeError if a config without block_configs is passed. Add a guard like other parts of the codebase do, or document the specific config type required and type-annotate accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py` around
lines 88 - 108, The function is_replacement_identical_to_teacher accesses
teacher_model_config.block_configs directly and can raise AttributeError for
configs without that attribute; add a guard at the start of the branch that
verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a
default) before indexing into teacher_model_config.block_configs[block_idx], and
if missing return False (or otherwise short-circuit) so the subsequent
comparisons involving BlockConfig (replacement_block_config, parallel_blocks,
parallel_blocks[0].attention/ffn) only run when block_configs exists and is
indexable.
🧹 Nitpick comments (1)
modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)

78-88: Update return type annotation from DeciLMConfig to PretrainedConfig.

The model_config property still declares -> DeciLMConfig but load_model_config can return any PretrainedConfig. This is inconsistent with the changes in replacement_utils.py where function signatures were updated to accept PretrainedConfig.

♻️ Proposed fix
     `@property`
-    def model_config(self) -> DeciLMConfig:
+    def model_config(self) -> PretrainedConfig:
         if self._model_config is None:
             trust_remote_code = self.descriptor.requires_trust_remote_code()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around
lines 78 - 88, The model_config property currently types its return as
DeciLMConfig but load_model_config can return any PretrainedConfig; update the
annotation on the property (model_config) from -> DeciLMConfig to ->
PretrainedConfig and also adjust any related attribute typing
(self._model_config) and imports so they use PretrainedConfig instead of
DeciLMConfig; ensure references in this class (e.g., model_config_overrides,
get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code,
load_model_config) remain unchanged except for the type switch to
PretrainedConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 159-163: The helper _get_arbitrary_checkpoint_dir currently falls
through and returns None when every layer_replacement["weight_paths"] is empty,
which causes get_arbitrary_checkpoint_dir to cache None and later code to treat
it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a
descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths
are found or alternatively return a clear sentinel Path, so callers like
get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally
receive None; locate the function _get_arbitrary_checkpoint_dir in
replacement_library and add the explicit error path after iterating
layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir
for valid returns.

In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py`:
- Around line 88-108: The function is_replacement_identical_to_teacher accesses
teacher_model_config.block_configs directly and can raise AttributeError for
configs without that attribute; add a guard at the start of the branch that
verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a
default) before indexing into teacher_model_config.block_configs[block_idx], and
if missing return False (or otherwise short-circuit) so the subsequent
comparisons involving BlockConfig (replacement_block_config, parallel_blocks,
parallel_blocks[0].attention/ffn) only run when block_configs exists and is
indexable.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 78-88: The model_config property currently types its return as
DeciLMConfig but load_model_config can return any PretrainedConfig; update the
annotation on the property (model_config) from -> DeciLMConfig to ->
PretrainedConfig and also adjust any related attribute typing
(self._model_config) and imports so they use PretrainedConfig instead of
DeciLMConfig; ensure references in this class (e.g., model_config_overrides,
get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code,
load_model_config) remain unchanged except for the type switch to
PretrainedConfig.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de36f048-aec3-4535-88ef-e97851883a16

📥 Commits

Reviewing files that changed from the base of the PR and between 110316a and d6ccd8f.

📒 Files selected for processing (13)
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_outputs.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__cache_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
  • modelopt/torch/puzzletron/replacement_library/replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_utils.py
💤 Files with no reviewable changes (8)
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.13%. Comparing base (110316a) to head (d6ccd8f).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1073      +/-   ##
======================================================
+ Coverage               72.12%   72.13%   +0.01%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
+ Hits                    17042    17045       +3     
+ Misses                   6586     6583       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danielkorzekwa danielkorzekwa merged commit 4190275 into feature/puzzletron Mar 23, 2026
28 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/decilm_hf_code_cleanup_2 branch March 23, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants