Skip to content

Remove unused mip functions + fix multi-gpu test#660

Merged
kevalmorabia97 merged 2 commits intofeature/compressfrom
kmorabia/compress-cleanup-1
Dec 8, 2025
Merged

Remove unused mip functions + fix multi-gpu test#660
kevalmorabia97 merged 2 commits intofeature/compressfrom
kmorabia/compress-cleanup-1

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Dec 8, 2025

What does this PR do?

Type of change: Improvement

  • Fix tests for 2-gpu: Some places hard-coded cpu device for distributed communications which was causing this issue
  • Remove unused constrain_search_space.py
  • Remove is_multi_layer_puzzle: False case
  • Remove use_greedy_search: False case
  • Remove knapsack mip case
  • Remove unused num_solutions and minimal_diversity flags

Testing

  • GH CICD test passing
  • Tested on 2-gpu setup locally as well

Summary by CodeRabbit

Release Notes

  • Refactor

    • Optimized solver implementation with improved library integration.
    • Simplified model compression configuration by removing deprecated search options.
    • Consolidated optimization paths for streamlined processing.
  • Chores

    • Updated dependencies for improved compatibility.
  • Documentation

    • Clarified Model-Optimizer installation instructions in examples.

✏️ Tip: You can customize this high-level summary in your review settings.

@kevalmorabia97 kevalmorabia97 requested review from a team as code owners December 8, 2025 06:06
@kevalmorabia97 kevalmorabia97 changed the base branch from main to feature/compress December 8, 2025 06:07
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner December 8, 2025 06:07
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from 4fda94b to 1f1a327 Compare December 8, 2025 09:16
@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 8, 2025

Walkthrough

The pull request removes deprecated multi-layer puzzle execution modes and greedy/grouped-knapsack solver implementations. It consolidates optimization to a single MIP-based approach using PuLP instead of the mip library, removes related CLI flags and configuration options, and simplifies constraint search logic by eliminating multi-path branching.

Changes

Cohort / File(s) Summary
Build & Dependencies
.github/workflows/gpu_tests.yml, setup.py
Removed GPU workflow step for installing libffi-dev; removed mip library dependency, added immutabledict, and reordered compress dependencies.
Solver Implementation Removal
modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py, modelopt/torch/_compress/mip/grouped_knapsack.py
Deleted both solver modules; removed run_greedy_search, multi_solution_grouped_knapsack, and associated helper functions and type aliases.
MIP Library Migration
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py
Migrated from mip library to PuLP throughout; replaced problem creation, variable definitions, constraints, and solver invocation with PuLP equivalents; added NaN/inf guards and time limit handling.
Constraint Search Simplification
modelopt/torch/_compress/mip/constrain_search_space.py
Removed multi-path branching for is_replacement_gathered_metrics; consolidated to single straightforward path using infer_teacher_replacement_config; deleted helper function is_replacement_gathered_metrics.
Puzzle Execution Consolidation
modelopt/torch/_compress/mip/run_puzzle.py
Removed deprecated CLI flags (--is_multi_layer_puzzle, --use_greedy_search) and conditional logic; consolidated to always use multi-layer MIP path; simplified metric dumping and imports.
Runtime Class Update
modelopt/torch/_compress/mip/mip_and_realize_models.py
Updated runtime class from NativeDDP_Runtime to NativeDdpRuntime.
Documentation & Examples
examples/compress/README.md, examples/pruning/README.md
Updated environment installation instruction to editable mode; changed Puzzle pruning reference to relative repo link.
Configuration Updates
examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml, tests/gpu/torch/_compress/resources/configs/*
Removed use_greedy_search and is_multi_layer_puzzle configuration keys; minor whitespace and comment formatting adjustments.
Test Utilities & Comments
tests/gpu/torch/_compress/compress_test_utils.py, tests/gpu/torch/_compress/test_compress.py
Condensed return type annotation; updated parameter typing for save_dummy_dataset; removed historical run instruction comments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • PuLP migration verification (mip_with_multi_layer_replacements.py): Ensure all mip library calls are correctly replaced with PuLP equivalents, including constraint handling and solver configuration.
  • Solver module deletions: Verify that removal of greedy_search_with_multi_layer_replacements.py and grouped_knapsack.py does not break imports or dependencies elsewhere in the codebase.
  • Path consolidation in run_puzzle.py: Confirm that all deprecated branches are properly removed and the unified multi-layer MIP path handles all prior scenarios correctly.
  • Configuration key removals: Validate that removal of use_greedy_search and is_multi_layer_puzzle from configs aligns with the consolidated code paths and no fallback logic remains.

Poem

🐰 Old puzzle paths now consolidated,
Greedy solvers and knapsacks vanquished clean,
PuLP replaces MIP, refactored and bright,
Simpler constraints, one true unified way—
hops off into cleaner code!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary changes: removing unused mip functions and fixing a multi-gpu test issue.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmorabia/compress-cleanup-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (2)

95-102: Consider accepting sub-optimal solutions when time limit is hit.

The current check only accepts LpStatusOptimal solutions. If the solver hits max_seconds_per_solution timeout, it may return LpStatusNotSolved even if a feasible (but not proven optimal) solution exists.

Consider also accepting pulp.LpStatusNotSolved when a feasible solution is found in the variables, or checking if any variable has a value set.

     # Check if solution is feasible
-    if problem.status != pulp.LpStatusOptimal:
+    if problem.status not in (pulp.LpStatusOptimal, pulp.LpStatusNotSolved):
         return []
+    # For NotSolved status, verify we have a solution by checking if any variable is set
+    if problem.status == pulp.LpStatusNotSolved:
+        has_solution = any(
+            replacement["is_chosen"].varValue is not None
+            for replacement in replacements.values()
+        )
+        if not has_solution:
+            return []

96-97: Consider suppressing solver output.

The CBC solver can be verbose by default. Consider adding msg=False to reduce log noise in production runs.

-    solver = pulp.PULP_CBC_CMD(timeLimit=max_seconds_per_solution)
+    solver = pulp.PULP_CBC_CMD(timeLimit=max_seconds_per_solution, msg=False)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1f63bc and c78810c98d414e1cfef1070e03e05548c950a1e1.

📒 Files selected for processing (15)
  • .github/workflows/gpu_tests.yml (0 hunks)
  • examples/compress/README.md (2 hunks)
  • examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml (3 hunks)
  • examples/pruning/README.md (1 hunks)
  • modelopt/torch/_compress/mip/constrain_search_space.py (3 hunks)
  • modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py (0 hunks)
  • modelopt/torch/_compress/mip/grouped_knapsack.py (0 hunks)
  • modelopt/torch/_compress/mip/mip_and_realize_models.py (1 hunks)
  • modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (4 hunks)
  • modelopt/torch/_compress/mip/run_puzzle.py (4 hunks)
  • setup.py (1 hunks)
  • tests/gpu/torch/_compress/compress_test_utils.py (3 hunks)
  • tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-attn-pruning.yaml (3 hunks)
  • tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-ffn-pruning.yaml (3 hunks)
  • tests/gpu/torch/_compress/test_compress.py (0 hunks)
💤 Files with no reviewable changes (4)
  • tests/gpu/torch/_compress/test_compress.py
  • .github/workflows/gpu_tests.yml
  • modelopt/torch/_compress/mip/grouped_knapsack.py
  • modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py
🧰 Additional context used
🧬 Code graph analysis (3)
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (1)
modelopt/torch/_compress/mip/utils.py (3)
  • consecutive_ngrams (57-75)
  • get_nested_key (39-54)
  • sort_replacements (27-36)
modelopt/torch/_compress/mip/mip_and_realize_models.py (1)
modelopt/torch/_compress/tools/runtime.py (1)
  • NativeDdpRuntime (203-393)
modelopt/torch/_compress/mip/run_puzzle.py (1)
modelopt/torch/_compress/utils/utils.py (1)
  • block_config_to_str (124-143)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
🔇 Additional comments (17)
examples/compress/README.md (1)

16-16: LGTM! Updated installation instruction.

The change from "TensorRT-Model-Optimizer" to "Model-Optimizer in editable mode" correctly reflects the package naming and installation approach.

examples/pruning/README.md (1)

26-26: LGTM! Improved documentation reference.

Using a relative link to ../compress/README.md instead of an external GitHub URL improves maintainability and ensures the link remains valid across branches and forks.

setup.py (2)

36-36: LGTM! PuLP is already present in required dependencies.

The pulp package is correctly included in the base required dependencies, which aligns with the PR objective of replacing the mip package with PuLP for optimization.


105-114: immutabledict dependency is necessary and actively used.

The immutabledict dependency is imported and used directly in the compression pipeline. It's required in modelopt/torch/_compress/subblock_stats/calc_subblock_stats.py for creating hashable, immutable configuration containers (subblock configs, frozen dicts) and in modelopt/torch/_compress/replacement_library/replacement_library.py for configuration overrides. The dependency is justified.

examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml (1)

81-86: LGTM! Deprecated configuration options removed.

The removal of use_greedy_search and is_multi_layer_puzzle from the mip_constraints section aligns with the PR objective of removing deprecated multi-layer puzzle execution modes and greedy solver implementations.

tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-attn-pruning.yaml (1)

81-84: LGTM! Consistent removal of deprecated flags.

The removal of the deprecated use_greedy_search and is_multi_layer_puzzle configuration options is consistent with the changes in other configuration files across the PR.

tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-ffn-pruning.yaml (1)

81-84: LGTM! Deprecated configuration flags removed consistently.

The removal of use_greedy_search and is_multi_layer_puzzle is consistent across all configuration files in this PR, demonstrating a thorough cleanup of the deprecated multi-layer puzzle execution modes.

tests/gpu/torch/_compress/compress_test_utils.py (2)

27-72: Return type annotation for setup_test_model_and_data is consistent with implementation

The function always returns three Path instances (puzzle_dir, llama_checkpoint_path, dataset_path), so the condensed annotation tuple[Path, Path, Path] accurately reflects the runtime behavior and improves readability without changing semantics.


131-169: save_dummy_dataset now correctly accepts Path and casts to str for save_to_disk

Allowing dataset_path: Path | str matches the actual call site (a Path from tmp_path / "dummy_dataset"), and data_dict.save_to_disk(str(dataset_path)) ensures compatibility with DatasetDict.save_to_disk, which expects a string path.

One thing to double‑check: the Path | str union syntax requires Python 3.10+ (or postponed evaluation of annotations). If this repo still supports Python 3.9 or earlier, you’ll need to switch to Union[Path, str] (or add from __future__ import annotations) instead.

modelopt/torch/_compress/mip/mip_and_realize_models.py (1)

77-83: LGTM - Class rename follows Python naming conventions.

The change from NativeDDP_Runtime to NativeDdpRuntime aligns with PEP8 CamelCase conventions for class names. The constructor arguments (dtype and torch_distributed_timeout) are correctly passed as before.

modelopt/torch/_compress/mip/run_puzzle.py (3)

242-249: LGTM - Simplified configuration handling.

The deepcopy(args) preserves immutability when overriding constraints and subblock_stats_args per run. The simplified _dump_gathered_metrics call aligns with the unified code path.


337-342: LGTM - Simplified metrics dumping.

The removal of the is_multi_layer_puzzle parameter and unified iteration over replacements aligns with the single code path approach.


454-456: LGTM - Unified metrics gathering path.

Always using gather_multi_layer_puzle_metrics simplifies the codebase by removing the branching between single-layer and multi-layer puzzle modes.

modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (4)

55-57: LGTM - Correct PuLP problem initialization.

The problem is correctly created with dynamic sense based on bigger_is_better. PuLP's LpMaximize/LpMinimize is the proper way to set optimization direction.


62-74: LGTM - Binary variable creation and constraint accumulation.

The PuLP binary variables are correctly created with cat=pulp.LpBinary. The accumulation of objective and constraint terms follows the expected pattern for building LP expressions.


86-90: Good defensive coding for PuLP's constraint handling.

The math.isfinite() guards prevent PuLP errors from non-finite constraint values. This is necessary because PuLP is stricter than the mip library in this regard.


109-120: LGTM - Correct binary variable value extraction.

Using >= 0.99 threshold to check binary variable selection is appropriate since LP solvers may return values like 0.9999999 due to floating-point precision. The subsequent validation logic ensures solution integrity.

@kevalmorabia97 kevalmorabia97 changed the title Replace mip package with pulp and remove unused mt._compress.mip functions Replace mip package with pulp and remove unused mip functions Dec 8, 2025
@kevalmorabia97 kevalmorabia97 removed the request for review from a team December 8, 2025 11:01
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from c78810c to f2fc34b Compare December 8, 2025 11:16
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from f2fc34b to b1051ff Compare December 8, 2025 11:39
@kevalmorabia97 kevalmorabia97 changed the title Replace mip package with pulp and remove unused mip functions Remove unused mip functions + fix multi-gpu test Dec 8, 2025
# --image gitlab-master.nvidia.com/deci/puzzletron:modelopt_main \
# --workdir $MODELOPT SRC DIRECTORY --interactive --gpu 1
#
# export PYTHONPATH=$PYTHONPATH:.:/workspace/puzzletron/v1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it (it is handy):

export PYTHONPATH=$PYTHONPATH:.
pytest -s -v ./tests/gpu/torch/_compress/test_compress.py::test_compress -o addopts=""

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need anything now. This test can be run just link any other modelopt test using pytest. No other setup needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipping -o addopts="" does not work when running this test from a docker container, but ok to remove it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be because of missing test dependencies. If you install nvidia-modelopt[all,dev-test] then it should work fine

src: Optional[int] = None,
group: Optional[torch.distributed.ProcessGroup] = None,
) -> Any:
obj_size_tensor = torch.LongTensor(1, device="cpu")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment. test_compress() test was working fine on 2 GPUs when running from torchrun, any idea why

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know about that. I saw issue with device placement so just added this change (using cursor)



def launch_mip_and_realize_model(cfg: DictConfig, runtime: IRuntime):
# Determine device for distributed operations (NCCL requires CUDA tensors)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the same in other puzzletron steps? maybe moving this function to some utilities?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

puzzle test was working fine with just this change. Perhaps other places dont need this


This reducer affects only the attention layers: FFNs are allowed their entire search space.
"""
is_multi_layer_puzzle = is_replacement_gathered_metrics(gathered_metrics)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is_replacement_gather_metrics functions was used here instead of a config variable: is_multi_layer_puzzle? are both the same? any risk they could be different? asking to be sure we could safely assume that is_replacement_gather_metrics() is also always true

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this function is not even used anywhere. Let me just remove the function all together

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output of this function was used, there was if-else code below this line (before removing it in this MR):

is_multi_layer_puzzle = is_replacement_gathered_metrics(gathered_metrics)

I just wonder why is_multi_layer_puzzle is computed with this function (is_replacement_gathered_metrics), and not using args.is_multi_layer_puzzle,

I am approving this MR as I think the risk is low if we remove it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look into it if we see a need to copy over this feature

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from 8a8cdb3 to d456ff6 Compare December 8, 2025 13:08
@kevalmorabia97 kevalmorabia97 merged commit a99f503 into feature/compress Dec 8, 2025
21 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/compress-cleanup-1 branch December 8, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants