Dkorzekwa/decilm hf code cleanup 2#1073
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
📝 WalkthroughWalkthroughThe PR removes extensive DeciLM-specific model components (attention mechanisms, RoPE embeddings, Mamba mixer, decoder layers) across multiple utility modules and refactors the replacement library to adopt a generic AnyModel loading approach instead of DeciLM-specific layer replacement discovery and caching. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…up_2 Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)
159-163:⚠️ Potential issue | 🟡 MinorMissing return value when no weight paths are found.
_get_arbitrary_checkpoint_dirimplicitly returnsNoneif alllayer_replacement["weight_paths"]are empty. The callerget_arbitrary_checkpoint_dirat line 156 would cacheNone, and subsequent uses (e.g., line 83, 98, 119) would fail when treating it as aPath. Consider raising an explicit error or returning a sentinel.🐛 Proposed fix to raise explicit error
def _get_arbitrary_checkpoint_dir(self) -> Path: for layer_replacement in self.replacement_library: weight_paths = layer_replacement["weight_paths"] if len(weight_paths) > 0: return weights_path_to_checkpoint_dir(weight_paths[0]) + raise ValueError("No checkpoint directory found: all layer replacements have empty weight_paths")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around lines 159 - 163, The helper _get_arbitrary_checkpoint_dir currently falls through and returns None when every layer_replacement["weight_paths"] is empty, which causes get_arbitrary_checkpoint_dir to cache None and later code to treat it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths are found or alternatively return a clear sentinel Path, so callers like get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally receive None; locate the function _get_arbitrary_checkpoint_dir in replacement_library and add the explicit error path after iterating layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir for valid returns.modelopt/torch/puzzletron/replacement_library/replacement_utils.py (1)
88-108:⚠️ Potential issue | 🟡 MinorAdd a guard to check if
block_configsexists before accessing it.Line 94 directly accesses
teacher_model_config.block_configs[block_idx]without verifying the attribute exists. Whileload_model_config()(which creates the config passed to this function) useshasattr(config, "block_configs")before processing, it does not guarantee the attribute is present on all configs. The codebase shows this is a real concern — comments note that vision-language models can have nested configs withoutblock_configsat the top level, and multiple files use defensivehasattrchecks (e.g.,utils/parsing.py,checkpoint_utils_hf.py). This will raiseAttributeErrorif a config withoutblock_configsis passed. Add a guard like other parts of the codebase do, or document the specific config type required and type-annotate accordingly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py` around lines 88 - 108, The function is_replacement_identical_to_teacher accesses teacher_model_config.block_configs directly and can raise AttributeError for configs without that attribute; add a guard at the start of the branch that verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a default) before indexing into teacher_model_config.block_configs[block_idx], and if missing return False (or otherwise short-circuit) so the subsequent comparisons involving BlockConfig (replacement_block_config, parallel_blocks, parallel_blocks[0].attention/ffn) only run when block_configs exists and is indexable.
🧹 Nitpick comments (1)
modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)
78-88: Update return type annotation fromDeciLMConfigtoPretrainedConfig.The
model_configproperty still declares-> DeciLMConfigbutload_model_configcan return anyPretrainedConfig. This is inconsistent with the changes inreplacement_utils.pywhere function signatures were updated to acceptPretrainedConfig.♻️ Proposed fix
`@property` - def model_config(self) -> DeciLMConfig: + def model_config(self) -> PretrainedConfig: if self._model_config is None: trust_remote_code = self.descriptor.requires_trust_remote_code()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around lines 78 - 88, The model_config property currently types its return as DeciLMConfig but load_model_config can return any PretrainedConfig; update the annotation on the property (model_config) from -> DeciLMConfig to -> PretrainedConfig and also adjust any related attribute typing (self._model_config) and imports so they use PretrainedConfig instead of DeciLMConfig; ensure references in this class (e.g., model_config_overrides, get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code, load_model_config) remain unchanged except for the type switch to PretrainedConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 159-163: The helper _get_arbitrary_checkpoint_dir currently falls
through and returns None when every layer_replacement["weight_paths"] is empty,
which causes get_arbitrary_checkpoint_dir to cache None and later code to treat
it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a
descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths
are found or alternatively return a clear sentinel Path, so callers like
get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally
receive None; locate the function _get_arbitrary_checkpoint_dir in
replacement_library and add the explicit error path after iterating
layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir
for valid returns.
In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py`:
- Around line 88-108: The function is_replacement_identical_to_teacher accesses
teacher_model_config.block_configs directly and can raise AttributeError for
configs without that attribute; add a guard at the start of the branch that
verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a
default) before indexing into teacher_model_config.block_configs[block_idx], and
if missing return False (or otherwise short-circuit) so the subsequent
comparisons involving BlockConfig (replacement_block_config, parallel_blocks,
parallel_blocks[0].attention/ffn) only run when block_configs exists and is
indexable.
---
Nitpick comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 78-88: The model_config property currently types its return as
DeciLMConfig but load_model_config can return any PretrainedConfig; update the
annotation on the property (model_config) from -> DeciLMConfig to ->
PretrainedConfig and also adjust any related attribute typing
(self._model_config) and imports so they use PretrainedConfig instead of
DeciLMConfig; ensure references in this class (e.g., model_config_overrides,
get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code,
load_model_config) remain unchanged except for the type switch to
PretrainedConfig.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: de36f048-aec3-4535-88ef-e97851883a16
📒 Files selected for processing (13)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_outputs.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__cache_utils.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.pymodelopt/torch/puzzletron/replacement_library/replacement_library.pymodelopt/torch/puzzletron/replacement_library/replacement_utils.py
💤 Files with no reviewable changes (8)
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1073 +/- ##
======================================================
+ Coverage 72.12% 72.13% +0.01%
======================================================
Files 209 209
Lines 23628 23628
======================================================
+ Hits 17042 17045 +3
+ Misses 6586 6583 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Delete not used decilm code
Summary by CodeRabbit