Remove unused mip functions + fix multi-gpu test#660
Remove unused mip functions + fix multi-gpu test#660kevalmorabia97 merged 2 commits intofeature/compressfrom
Conversation
4fda94b to
1f1a327
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe pull request removes deprecated multi-layer puzzle execution modes and greedy/grouped-knapsack solver implementations. It consolidates optimization to a single MIP-based approach using PuLP instead of the Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (2)
95-102: Consider accepting sub-optimal solutions when time limit is hit.The current check only accepts
LpStatusOptimalsolutions. If the solver hitsmax_seconds_per_solutiontimeout, it may returnLpStatusNotSolvedeven if a feasible (but not proven optimal) solution exists.Consider also accepting
pulp.LpStatusNotSolvedwhen a feasible solution is found in the variables, or checking if any variable has a value set.# Check if solution is feasible - if problem.status != pulp.LpStatusOptimal: + if problem.status not in (pulp.LpStatusOptimal, pulp.LpStatusNotSolved): return [] + # For NotSolved status, verify we have a solution by checking if any variable is set + if problem.status == pulp.LpStatusNotSolved: + has_solution = any( + replacement["is_chosen"].varValue is not None + for replacement in replacements.values() + ) + if not has_solution: + return []
96-97: Consider suppressing solver output.The CBC solver can be verbose by default. Consider adding
msg=Falseto reduce log noise in production runs.- solver = pulp.PULP_CBC_CMD(timeLimit=max_seconds_per_solution) + solver = pulp.PULP_CBC_CMD(timeLimit=max_seconds_per_solution, msg=False)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between a1f63bc and c78810c98d414e1cfef1070e03e05548c950a1e1.
📒 Files selected for processing (15)
.github/workflows/gpu_tests.yml(0 hunks)examples/compress/README.md(2 hunks)examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml(3 hunks)examples/pruning/README.md(1 hunks)modelopt/torch/_compress/mip/constrain_search_space.py(3 hunks)modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py(0 hunks)modelopt/torch/_compress/mip/grouped_knapsack.py(0 hunks)modelopt/torch/_compress/mip/mip_and_realize_models.py(1 hunks)modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py(4 hunks)modelopt/torch/_compress/mip/run_puzzle.py(4 hunks)setup.py(1 hunks)tests/gpu/torch/_compress/compress_test_utils.py(3 hunks)tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-attn-pruning.yaml(3 hunks)tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-ffn-pruning.yaml(3 hunks)tests/gpu/torch/_compress/test_compress.py(0 hunks)
💤 Files with no reviewable changes (4)
- tests/gpu/torch/_compress/test_compress.py
- .github/workflows/gpu_tests.yml
- modelopt/torch/_compress/mip/grouped_knapsack.py
- modelopt/torch/_compress/mip/greedy_search_with_multi_layer_replacements.py
🧰 Additional context used
🧬 Code graph analysis (3)
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (1)
modelopt/torch/_compress/mip/utils.py (3)
consecutive_ngrams(57-75)get_nested_key(39-54)sort_replacements(27-36)
modelopt/torch/_compress/mip/mip_and_realize_models.py (1)
modelopt/torch/_compress/tools/runtime.py (1)
NativeDdpRuntime(203-393)
modelopt/torch/_compress/mip/run_puzzle.py (1)
modelopt/torch/_compress/utils/utils.py (1)
block_config_to_str(124-143)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
🔇 Additional comments (17)
examples/compress/README.md (1)
16-16: LGTM! Updated installation instruction.The change from "TensorRT-Model-Optimizer" to "Model-Optimizer in editable mode" correctly reflects the package naming and installation approach.
examples/pruning/README.md (1)
26-26: LGTM! Improved documentation reference.Using a relative link to
../compress/README.mdinstead of an external GitHub URL improves maintainability and ensures the link remains valid across branches and forks.setup.py (2)
36-36: LGTM! PuLP is already present in required dependencies.The
pulppackage is correctly included in the base required dependencies, which aligns with the PR objective of replacing themippackage with PuLP for optimization.
105-114:immutabledictdependency is necessary and actively used.The
immutabledictdependency is imported and used directly in the compression pipeline. It's required inmodelopt/torch/_compress/subblock_stats/calc_subblock_stats.pyfor creating hashable, immutable configuration containers (subblock configs, frozen dicts) and inmodelopt/torch/_compress/replacement_library/replacement_library.pyfor configuration overrides. The dependency is justified.examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml (1)
81-86: LGTM! Deprecated configuration options removed.The removal of
use_greedy_searchandis_multi_layer_puzzlefrom themip_constraintssection aligns with the PR objective of removing deprecated multi-layer puzzle execution modes and greedy solver implementations.tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-attn-pruning.yaml (1)
81-84: LGTM! Consistent removal of deprecated flags.The removal of the deprecated
use_greedy_searchandis_multi_layer_puzzleconfiguration options is consistent with the changes in other configuration files across the PR.tests/gpu/torch/_compress/resources/configs/Llama-3_1-8B-ffn-pruning.yaml (1)
81-84: LGTM! Deprecated configuration flags removed consistently.The removal of
use_greedy_searchandis_multi_layer_puzzleis consistent across all configuration files in this PR, demonstrating a thorough cleanup of the deprecated multi-layer puzzle execution modes.tests/gpu/torch/_compress/compress_test_utils.py (2)
27-72: Return type annotation forsetup_test_model_and_datais consistent with implementationThe function always returns three
Pathinstances (puzzle_dir,llama_checkpoint_path,dataset_path), so the condensed annotationtuple[Path, Path, Path]accurately reflects the runtime behavior and improves readability without changing semantics.
131-169:save_dummy_datasetnow correctly acceptsPathand casts tostrforsave_to_diskAllowing
dataset_path: Path | strmatches the actual call site (aPathfromtmp_path / "dummy_dataset"), anddata_dict.save_to_disk(str(dataset_path))ensures compatibility withDatasetDict.save_to_disk, which expects a string path.One thing to double‑check: the
Path | strunion syntax requires Python 3.10+ (or postponed evaluation of annotations). If this repo still supports Python 3.9 or earlier, you’ll need to switch toUnion[Path, str](or addfrom __future__ import annotations) instead.modelopt/torch/_compress/mip/mip_and_realize_models.py (1)
77-83: LGTM - Class rename follows Python naming conventions.The change from
NativeDDP_RuntimetoNativeDdpRuntimealigns with PEP8 CamelCase conventions for class names. The constructor arguments (dtypeandtorch_distributed_timeout) are correctly passed as before.modelopt/torch/_compress/mip/run_puzzle.py (3)
242-249: LGTM - Simplified configuration handling.The
deepcopy(args)preserves immutability when overriding constraints and subblock_stats_args per run. The simplified_dump_gathered_metricscall aligns with the unified code path.
337-342: LGTM - Simplified metrics dumping.The removal of the
is_multi_layer_puzzleparameter and unified iteration over replacements aligns with the single code path approach.
454-456: LGTM - Unified metrics gathering path.Always using
gather_multi_layer_puzle_metricssimplifies the codebase by removing the branching between single-layer and multi-layer puzzle modes.modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py (4)
55-57: LGTM - Correct PuLP problem initialization.The problem is correctly created with dynamic sense based on
bigger_is_better. PuLP'sLpMaximize/LpMinimizeis the proper way to set optimization direction.
62-74: LGTM - Binary variable creation and constraint accumulation.The PuLP binary variables are correctly created with
cat=pulp.LpBinary. The accumulation of objective and constraint terms follows the expected pattern for building LP expressions.
86-90: Good defensive coding for PuLP's constraint handling.The
math.isfinite()guards prevent PuLP errors from non-finite constraint values. This is necessary because PuLP is stricter than themiplibrary in this regard.
109-120: LGTM - Correct binary variable value extraction.Using
>= 0.99threshold to check binary variable selection is appropriate since LP solvers may return values like 0.9999999 due to floating-point precision. The subsequent validation logic ensures solution integrity.
c78810c to
f2fc34b
Compare
modelopt/torch/_compress/mip/mip_with_multi_layer_replacements.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
f2fc34b to
b1051ff
Compare
| # --image gitlab-master.nvidia.com/deci/puzzletron:modelopt_main \ | ||
| # --workdir $MODELOPT SRC DIRECTORY --interactive --gpu 1 | ||
| # | ||
| # export PYTHONPATH=$PYTHONPATH:.:/workspace/puzzletron/v1 |
There was a problem hiding this comment.
I would keep it (it is handy):
export PYTHONPATH=$PYTHONPATH:.
pytest -s -v ./tests/gpu/torch/_compress/test_compress.py::test_compress -o addopts=""
There was a problem hiding this comment.
We dont need anything now. This test can be run just link any other modelopt test using pytest. No other setup needed
There was a problem hiding this comment.
skipping -o addopts="" does not work when running this test from a docker container, but ok to remove it
There was a problem hiding this comment.
It might be because of missing test dependencies. If you install nvidia-modelopt[all,dev-test] then it should work fine
| src: Optional[int] = None, | ||
| group: Optional[torch.distributed.ProcessGroup] = None, | ||
| ) -> Any: | ||
| obj_size_tensor = torch.LongTensor(1, device="cpu") |
There was a problem hiding this comment.
just a comment. test_compress() test was working fine on 2 GPUs when running from torchrun, any idea why
There was a problem hiding this comment.
I dont know about that. I saw issue with device placement so just added this change (using cursor)
|
|
||
|
|
||
| def launch_mip_and_realize_model(cfg: DictConfig, runtime: IRuntime): | ||
| # Determine device for distributed operations (NCCL requires CUDA tensors) |
There was a problem hiding this comment.
do we need the same in other puzzletron steps? maybe moving this function to some utilities?
There was a problem hiding this comment.
puzzle test was working fine with just this change. Perhaps other places dont need this
|
|
||
| This reducer affects only the attention layers: FFNs are allowed their entire search space. | ||
| """ | ||
| is_multi_layer_puzzle = is_replacement_gathered_metrics(gathered_metrics) |
There was a problem hiding this comment.
why is_replacement_gather_metrics functions was used here instead of a config variable: is_multi_layer_puzzle? are both the same? any risk they could be different? asking to be sure we could safely assume that is_replacement_gather_metrics() is also always true
There was a problem hiding this comment.
Actually this function is not even used anywhere. Let me just remove the function all together
There was a problem hiding this comment.
the output of this function was used, there was if-else code below this line (before removing it in this MR):
is_multi_layer_puzzle = is_replacement_gathered_metrics(gathered_metrics)
I just wonder why is_multi_layer_puzzle is computed with this function (is_replacement_gathered_metrics), and not using args.is_multi_layer_puzzle,
I am approving this MR as I think the risk is low if we remove it.
There was a problem hiding this comment.
We can look into it if we see a need to copy over this feature
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
8a8cdb3 to
d456ff6
Compare
What does this PR do?
Type of change: Improvement
is_multi_layer_puzzle: Falsecaseuse_greedy_search: Falsecasenum_solutionsandminimal_diversityflagsTesting
Summary by CodeRabbit
Release Notes
Refactor
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.