Remove unused mip functions + fix multi-gpu test by kevalmorabia97 · Pull Request #660 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2025-12-08T06:06:53Z

What does this PR do?

Type of change: Improvement

Fix tests for 2-gpu: Some places hard-coded cpu device for distributed communications which was causing this issue
Remove unused constrain_search_space.py
Remove is_multi_layer_puzzle: False case
Remove use_greedy_search: False case
Remove knapsack mip case
Remove unused num_solutions and minimal_diversity flags

Testing

GH CICD test passing
Tested on 2-gpu setup locally as well

Summary by CodeRabbit

Release Notes

Refactor
- Optimized solver implementation with improved library integration.
- Simplified model compression configuration by removing deprecated search options.
- Consolidated optimization paths for streamlined processing.
Chores
- Updated dependencies for improved compatibility.
Documentation
- Clarified Model-Optimizer installation instructions in examples.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

kevalmorabia97 · 2025-12-08T09:41:38Z

@coderabbitai review

coderabbitai · 2025-12-08T09:41:50Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-12-08T09:47:42Z

Walkthrough

The pull request removes deprecated multi-layer puzzle execution modes and greedy/grouped-knapsack solver implementations. It consolidates optimization to a single MIP-based approach using PuLP instead of the mip library, removes related CLI flags and configuration options, and simplifies constraint search logic by eliminating multi-path branching.

Changes

Cohort / File(s)	Summary
Build & Dependencies `.github/workflows/gpu_tests.yml`, `setup.py`	Removed GPU workflow step for installing libffi-dev; removed `mip` library dependency, added `immutabledict`, and reordered compress dependencies.
Solver Implementation Removal `modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py`, `modelopt/torch/_compress/mip/grouped_knapsack.py`	Deleted both solver modules; removed `run_greedy_search`, `multi_solution_grouped_knapsack`, and associated helper functions and type aliases.
MIP Library Migration `modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py`	Migrated from `mip` library to PuLP throughout; replaced problem creation, variable definitions, constraints, and solver invocation with PuLP equivalents; added NaN/inf guards and time limit handling.
Constraint Search Simplification `modelopt/torch/_compress/mip/constrain_search_space.py`	Removed multi-path branching for `is_replacement_gathered_metrics`; consolidated to single straightforward path using `infer_teacher_replacement_config`; deleted helper function `is_replacement_gathered_metrics`.
Puzzle Execution Consolidation `modelopt/torch/_compress/mip/run_puzzle.py`	Removed deprecated CLI flags (`--is_multi_layer_puzzle`, `--use_greedy_search`) and conditional logic; consolidated to always use multi-layer MIP path; simplified metric dumping and imports.
Runtime Class Update `modelopt/torch/_compress/mip/mip_and_realize_models.py`	Updated runtime class from `NativeDDP_Runtime` to `NativeDdpRuntime`.
Documentation & Examples `examples/compress/README.md`, `examples/pruning/README.md`	Updated environment installation instruction to editable mode; changed Puzzle pruning reference to relative repo link.
Configuration Updates `examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml`, `tests/gpu/torch/_compress/resources/configs/*`	Removed `use_greedy_search` and `is_multi_layer_puzzle` configuration keys; minor whitespace and comment formatting adjustments.
Test Utilities & Comments `tests/gpu/torch/_compress/compress_test_utils.py`, `tests/gpu/torch/_compress/test_compress.py`	Condensed return type annotation; updated parameter typing for `save_dummy_dataset`; removed historical run instruction comments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

PuLP migration verification (mip_with_multi_layer_replacements.py): Ensure all mip library calls are correctly replaced with PuLP equivalents, including constraint handling and solver configuration.
Solver module deletions: Verify that removal of greedy_search_with_multi_layer_replacements.py and grouped_knapsack.py does not break imports or dependencies elsewhere in the codebase.
Path consolidation in run_puzzle.py: Confirm that all deprecated branches are properly removed and the unified multi-layer MIP path handles all prior scenarios correctly.
Configuration key removals: Validate that removal of use_greedy_search and is_multi_layer_puzzle from configs aligns with the consolidated code paths and no fallback logic remains.

Poem

🐰 Old puzzle paths now consolidated,
Greedy solvers and knapsacks vanquished clean,
PuLP replaces MIP, refactored and bright,
Simpler constraints, one true unified way—
hops off into cleaner code! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the primary changes: removing unused mip functions and fixing a multi-gpu test issue.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmorabia/compress-cleanup-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (2)
95-102: Consider accepting sub-optimal solutions when time limit is hit.

The current check only accepts LpStatusOptimal solutions. If the solver hits max_seconds_per_solution timeout, it may return LpStatusNotSolved even if a feasible (but not proven optimal) solution exists.

Consider also accepting pulp.LpStatusNotSolved when a feasible solution is found in the variables, or checking if any variable has a value set.
     # Check if solution is feasible
-    if problem.status != pulp.LpStatusOptimal:
+    if problem.status not in (pulp.LpStatusOptimal, pulp.LpStatusNotSolved):
         return []
+    # For NotSolved status, verify we have a solution by checking if any variable is set
+    if problem.status == pulp.LpStatusNotSolved:
+        has_solution = any(
+            replacement["is_chosen"].varValue is not None
+            for replacement in replacements.values()
+        )
+        if not has_solution:
+            return []
96-97: Consider suppressing solver output.

The CBC solver can be verbose by default. Consider adding msg=False to reduce log noise in production runs.
-    solver = pulp.PULP_CBC_CMD(timeLimit=max_seconds_per_solution)
+    solver = pulp.PULP_CBC_CMD(timeLimit=max_seconds_per_solution, msg=False)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1f63bc and c78810c98d414e1cfef1070e03e05548c950a1e1.

📒 Files selected for processing (15)

.github/workflows/gpu_tests.yml (0 hunks)
examples/compress/README.md (2 hunks)
examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml (3 hunks)
examples/pruning/README.md (1 hunks)
modelopt/torch/_compress/mip/constrain_search_space.py (3 hunks)
modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py (0 hunks)
modelopt/torch/_compress/mip/grouped_knapsack.py (0 hunks)
modelopt/torch/_compress/mip/mip_and_realize_models.py (1 hunks)
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (4 hunks)
modelopt/torch/_compress/mip/run_puzzle.py (4 hunks)
setup.py (1 hunks)
tests/gpu/torch/_compress/compress_test_utils.py (3 hunks)
tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-attn-pruning.yaml (3 hunks)
tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-ffn-pruning.yaml (3 hunks)
tests/gpu/torch/_compress/test_compress.py (0 hunks)

💤 Files with no reviewable changes (4)

tests/gpu/torch/_compress/test_compress.py
.github/workflows/gpu_tests.yml
modelopt/torch/_compress/mip/grouped_knapsack.py
modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py

🧰 Additional context used

🧬 Code graph analysis (3)

modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (1)

modelopt/torch/_compress/mip/utils.py (3)

consecutive_ngrams (57-75)

get_nested_key (39-54)

sort_replacements (27-36)

modelopt/torch/_compress/mip/mip_and_realize_models.py (1)

modelopt/torch/_compress/tools/runtime.py (1)

NativeDdpRuntime (203-393)

modelopt/torch/_compress/mip/run_puzzle.py (1)

modelopt/torch/_compress/utils/utils.py (1)

block_config_to_str (124-143)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: code-quality

🔇 Additional comments (17)

examples/compress/README.md (1)

16-16: LGTM! Updated installation instruction.

The change from "TensorRT-Model-Optimizer" to "Model-Optimizer in editable mode" correctly reflects the package naming and installation approach.

examples/pruning/README.md (1)

26-26: LGTM! Improved documentation reference.

Using a relative link to ../compress/README.md instead of an external GitHub URL improves maintainability and ensures the link remains valid across branches and forks.

setup.py (2)

36-36: LGTM! PuLP is already present in required dependencies.

The pulp package is correctly included in the base required dependencies, which aligns with the PR objective of replacing the mip package with PuLP for optimization.

105-114: immutabledict dependency is necessary and actively used.

The immutabledict dependency is imported and used directly in the compression pipeline. It's required in modelopt/torch/_compress/subblock_stats/calc_subblock_stats.py for creating hashable, immutable configuration containers (subblock configs, frozen dicts) and in modelopt/torch/_compress/replacement_library/replacement_library.py for configuration overrides. The dependency is justified.

examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml (1)

81-86: LGTM! Deprecated configuration options removed.

The removal of use_greedy_search and is_multi_layer_puzzle from the mip_constraints section aligns with the PR objective of removing deprecated multi-layer puzzle execution modes and greedy solver implementations.

tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-attn-pruning.yaml (1)

81-84: LGTM! Consistent removal of deprecated flags.

The removal of the deprecated use_greedy_search and is_multi_layer_puzzle configuration options is consistent with the changes in other configuration files across the PR.

tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-ffn-pruning.yaml (1)

81-84: LGTM! Deprecated configuration flags removed consistently.

The removal of use_greedy_search and is_multi_layer_puzzle is consistent across all configuration files in this PR, demonstrating a thorough cleanup of the deprecated multi-layer puzzle execution modes.

tests/gpu/torch/_compress/compress_test_utils.py (2)

27-72: Return type annotation for setup_test_model_and_data is consistent with implementation

The function always returns three Path instances (puzzle_dir, llama_checkpoint_path, dataset_path), so the condensed annotation tuple[Path, Path, Path] accurately reflects the runtime behavior and improves readability without changing semantics.

131-169: save_dummy_dataset now correctly accepts Path and casts to str for save_to_disk

Allowing dataset_path: Path | str matches the actual call site (a Path from tmp_path / "dummy_dataset"), and data_dict.save_to_disk(str(dataset_path)) ensures compatibility with DatasetDict.save_to_disk, which expects a string path.

One thing to double‑check: the Path | str union syntax requires Python 3.10+ (or postponed evaluation of annotations). If this repo still supports Python 3.9 or earlier, you’ll need to switch to Union[Path, str] (or add from __future__ import annotations) instead.

modelopt/torch/_compress/mip/mip_and_realize_models.py (1)

77-83: LGTM - Class rename follows Python naming conventions.

The change from NativeDDP_Runtime to NativeDdpRuntime aligns with PEP8 CamelCase conventions for class names. The constructor arguments (dtype and torch_distributed_timeout) are correctly passed as before.

modelopt/torch/_compress/mip/run_puzzle.py (3)

242-249: LGTM - Simplified configuration handling.

The deepcopy(args) preserves immutability when overriding constraints and subblock_stats_args per run. The simplified _dump_gathered_metrics call aligns with the unified code path.

337-342: LGTM - Simplified metrics dumping.

The removal of the is_multi_layer_puzzle parameter and unified iteration over replacements aligns with the single code path approach.

454-456: LGTM - Unified metrics gathering path.

Always using gather_multi_layer_puzle_metrics simplifies the codebase by removing the branching between single-layer and multi-layer puzzle modes.

modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (4)

55-57: LGTM - Correct PuLP problem initialization.

The problem is correctly created with dynamic sense based on bigger_is_better. PuLP's LpMaximize/LpMinimize is the proper way to set optimization direction.

62-74: LGTM - Binary variable creation and constraint accumulation.

The PuLP binary variables are correctly created with cat=pulp.LpBinary. The accumulation of objective and constraint terms follows the expected pattern for building LP expressions.

86-90: Good defensive coding for PuLP's constraint handling.

The math.isfinite() guards prevent PuLP errors from non-finite constraint values. This is necessary because PuLP is stricter than the mip library in this regard.

109-120: LGTM - Correct binary variable value extraction.

Using >= 0.99 threshold to check binary variable selection is appropriate since LP solvers may return values like 0.9999999 due to floating-point precision. The subsequent validation logic ensures solution integrity.

modelopt/torch/_compress/mip/constrain_search_space.py

modelopt/torch/_compress/mip/run_puzzle.py

modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

danielkorzekwa · 2025-12-08T12:03:59Z

tests/gpu/torch/_compress/test_compress.py

-# --image gitlab-master.nvidia.com/deci/puzzletron:modelopt_main \
-# --workdir $MODELOPT SRC DIRECTORY --interactive --gpu 1
-#
-# export PYTHONPATH=$PYTHONPATH:.:/workspace/puzzletron/v1


I would keep it (it is handy):

export PYTHONPATH=$PYTHONPATH:. pytest -s -v ./tests/gpu/torch/_compress/test_compress.py::test_compress -o addopts=""

We dont need anything now. This test can be run just link any other modelopt test using pytest. No other setup needed

skipping -o addopts="" does not work when running this test from a docker container, but ok to remove it

It might be because of missing test dependencies. If you install nvidia-modelopt[all,dev-test] then it should work fine

danielkorzekwa · 2025-12-08T12:11:31Z

modelopt/torch/_compress/sewing_kit/utils.py

    src: Optional[int] = None,
    group: Optional[torch.distributed.ProcessGroup] = None,
 ) -> Any:
-    obj_size_tensor = torch.LongTensor(1, device="cpu")


just a comment. test_compress() test was working fine on 2 GPUs when running from torchrun, any idea why

I dont know about that. I saw issue with device placement so just added this change (using cursor)

danielkorzekwa · 2025-12-08T12:14:06Z

modelopt/torch/_compress/mip/mip_and_realize_models.py



 def launch_mip_and_realize_model(cfg: DictConfig, runtime: IRuntime):
+    # Determine device for distributed operations (NCCL requires CUDA tensors)


do we need the same in other puzzletron steps? maybe moving this function to some utilities?

puzzle test was working fine with just this change. Perhaps other places dont need this

danielkorzekwa · 2025-12-08T12:24:58Z

modelopt/torch/_compress/mip/constrain_search_space.py


    This reducer affects only the attention layers: FFNs are allowed their entire search space.
    """
-    is_multi_layer_puzzle = is_replacement_gathered_metrics(gathered_metrics)


why is_replacement_gather_metrics functions was used here instead of a config variable: is_multi_layer_puzzle? are both the same? any risk they could be different? asking to be sure we could safely assume that is_replacement_gather_metrics() is also always true

Actually this function is not even used anywhere. Let me just remove the function all together

the output of this function was used, there was if-else code below this line (before removing it in this MR):

is_multi_layer_puzzle = is_replacement_gathered_metrics(gathered_metrics)

I just wonder why is_multi_layer_puzzle is computed with this function (is_replacement_gathered_metrics), and not using args.is_multi_layer_puzzle,

I am approving this MR as I think the risk is low if we remove it.

We can look into it if we see a need to copy over this feature

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested a review from danielkorzekwa December 8, 2025 06:06

kevalmorabia97 requested review from a team as code owners December 8, 2025 06:06

kevalmorabia97 requested review from AAnoosheh, ChenhanYu, Edwardf0t1 and cjluo-nv December 8, 2025 06:06

kevalmorabia97 changed the base branch from main to feature/compress December 8, 2025 06:07

kevalmorabia97 requested a review from a team as a code owner December 8, 2025 06:07

kevalmorabia97 removed request for Edwardf0t1 and cjluo-nv December 8, 2025 06:07

kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from 4fda94b to 1f1a327 Compare December 8, 2025 09:16

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

kevalmorabia97 changed the title ~~Replace mip package with pulp and remove unused mt._compress.mip functions~~ Replace mip package with pulp and remove unused mip functions Dec 8, 2025

kevalmorabia97 removed the request for review from a team December 8, 2025 11:01

kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from c78810c to f2fc34b Compare December 8, 2025 11:16

danielkorzekwa reviewed Dec 8, 2025

View reviewed changes

modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py Outdated Show resolved Hide resolved

Remove unused mip functions and fix 2-gpu test

b1051ff

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from f2fc34b to b1051ff Compare December 8, 2025 11:39

kevalmorabia97 changed the title ~~Replace mip package with pulp and remove unused mip functions~~ Remove unused mip functions + fix multi-gpu test Dec 8, 2025

danielkorzekwa reviewed Dec 8, 2025

View reviewed changes

Remove unused constrain_search_space logic

d456ff6

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/compress-cleanup-1 branch from 8a8cdb3 to d456ff6 Compare December 8, 2025 13:08

danielkorzekwa approved these changes Dec 8, 2025

View reviewed changes

kevalmorabia97 merged commit a99f503 into feature/compress Dec 8, 2025
21 checks passed

kevalmorabia97 deleted the kmorabia/compress-cleanup-1 branch December 8, 2025 15:29



		def launch_mip_and_realize_model(cfg: DictConfig, runtime: IRuntime):
		# Determine device for distributed operations (NCCL requires CUDA tensors)

Conversation

kevalmorabia97 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

kevalmorabia97 commented Dec 8, 2025

Uh oh!

coderabbitai bot commented Dec 8, 2025

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevalmorabia97 commented Dec 8, 2025 •

edited

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading