Skip to content

Add subblock stats to the compress algorithm#623

Merged
danielkorzekwa merged 60 commits intofeature/compressfrom
dkorzekwa/subblock_stats
Dec 1, 2025
Merged

Add subblock stats to the compress algorithm#623
danielkorzekwa merged 60 commits intofeature/compressfrom
dkorzekwa/subblock_stats

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

What does this PR do?

Add subblock stats to the compress algorithm.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner December 1, 2025 15:52
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.37%. Comparing base (97fe7f0) to head (6cb644a).
⚠️ Report is 1 commits behind head on feature/compress.

Additional details and impacted files
@@                Coverage Diff                @@
##           feature/compress     #623   +/-   ##
=================================================
  Coverage             74.37%   74.37%           
=================================================
  Files                   182      182           
  Lines                 18219    18219           
=================================================
  Hits                  13550    13550           
  Misses                 4669     4669           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

raise_unknown_subblock_config_error(subblock_config)


def calculate_subblock_params(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think param count is a bit hacky and can be simplified by just running 1 forward pass on sample input and calculate params. We have this simple utility in modelopt already (param_num_from_forward): https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/network.py#L129 which is generic and works for any HF model (moe or dense).

We can run both functions and compare the numbers as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good candidate for a shared component. Added as a high priority internal Nvidia issue: issues/74.

)


def calculate_subblock_memory(
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question - is memory = (active params * param dtype memory) + (kv cache params * kv cache dtype memory) ? Or is a bit more complicated than that? I see a lot of logic for different types of layers but there isnt any docstring so I'm not sure whats the main reason for custom logic per layer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sepehr raised similar questions and concerns. Added to: issues/74

Comment on lines +94 to +96
from puzzle_tools.subblock_stats.runtime_stats.calc_runtime_stats import (
calc_runtime_ms_for_subblocks,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for TRT-LLM stats?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is used only if runtime_stats is enabled via a config param.

Comment on lines +199 to +203
# TODO: fix
# from puzzle_tools.calc_subblock_runtime import measure_non_block_runtime_ms
# non_block_runtime_ms, embedding_runtime_ms, lm_head_runtime_ms = \
# measure_non_block_runtime_ms(batch_size, prefill_seq_len, generation_seq_len, n_embd, vocab_size,
# benchmark_iterations, use_cuda_graph)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be added in follow-up PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once scoring/mip are in, we can prioritize what to do next. I added an internal issue for subblock runtime stats: issues/75

Comment on lines +271 to +277
# ==== START === Setup for attach-helper ====
# import sys
# import os
# sys.path.insert(0, os.environ["ATTACH_HELPER_INSTALLATION_PATH"])
# from attach_helper import debugging_setup
# debugging_setup() # You can optionally pass a name to identify the job (e.g. `debugging_setup(name="my_script")`)
# ==== END === Setup for attach-helper ====
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for debugging likely, I removed

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa merged commit 954103e into feature/compress Dec 1, 2025
21 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/subblock_stats branch December 1, 2025 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants