[Minor] fix: do not requantize the scales in FP8 scale sweep calibration by Fridah-nv · Pull Request #825 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-01-28T22:12:34Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added conditional pre-quantization optimization for static block quantization when scale sweep is enabled.
- Enhanced calibration flow with configurable scale quantization behavior based on optimization settings.
Improvements
- Updated quantization calibration documentation to clarify scale behavior and optimization interactions.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-28T22:12:50Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

These changes implement an optimized FP8 quantization path for NVFP4 static per-block quantization. When fp8_scale_sweep is enabled, the quantizer skips dynamic FP8 scale quantization by setting a flag that propagates through the calibrator and into the fake quantization path, using pre-computed scales instead.

Changes

Cohort / File(s)	Summary
FP8 scale sweep configuration `modelopt/torch/quantization/model_calib.py`	Added conditional logic to enable pre-quantization optimization for NVFP4 static per-block quantization when `fp8_scale_sweep` is enabled; sets `skip_fp8_scale_quant` flag on the quantizer before MSE calibrator construction. Updated docstring to clarify FP8 scale value count (128 → 126 valid values).
Fake quantization flag handling `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`	Introduced `skip_scale_quant` local variable derived from `block_sizes["skip_fp8_scale_quant"]` in the 2-bit static block quantization path; passes this flag to `static_blockwise_fp4_fake_quant` instead of always using `False`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title accurately describes the main change: preventing scale requantization in FP8 scale sweep calibration, which is the core objective evident in both modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fridah/fix-fp8-sweep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-28T22:23:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.44%. Comparing base (2a46753) to head (aee1bd7).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #825   +/-   ##
=======================================
  Coverage   73.44%   73.44%           
=======================================
  Files         194      194           
  Lines       20034    20034           
=======================================
  Hits        14714    14714           
  Misses       5320     5320

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

realAsma · 2026-01-28T22:54:16Z

@Fridah-nv can you please add [Minor] tag in the PR title?

copy-pr-bot · 2026-01-29T20:17:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv · 2026-02-07T01:03:42Z

Fix applied in #849

Fridah-nv requested a review from realAsma January 28, 2026 22:12

Fridah-nv self-assigned this Jan 28, 2026

Fridah-nv requested a review from a team as a code owner January 28, 2026 22:12

realAsma approved these changes Jan 28, 2026

View reviewed changes

Fridah-nv changed the title ~~fix: do not requantize the scales in FP8 scale sweep calibration~~ [Minor] fix: do not requantize the scales in FP8 scale sweep calibration Jan 28, 2026

Fridah-nv force-pushed the fridah/fix-fp8-sweep branch from bdad690 to 1caa24f Compare January 29, 2026 20:17

Fridah-nv added 2 commits February 2, 2026 16:44

fix FP8 sweep, do not requantize the scales

8200aec

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

default skip_scale_quant to False

aee1bd7

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv force-pushed the fridah/fix-fp8-sweep branch from 1caa24f to aee1bd7 Compare February 3, 2026 00:44

Fridah-nv enabled auto-merge (squash) February 3, 2026 02:01

Fridah-nv closed this Feb 7, 2026

auto-merge was automatically disabled February 7, 2026 01:03
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Minor] fix: do not requantize the scales in FP8 scale sweep calibration#825

[Minor] fix: do not requantize the scales in FP8 scale sweep calibration#825
Fridah-nv wants to merge 2 commits intomainfrom
fridah/fix-fp8-sweep

Fridah-nv commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

realAsma commented Jan 28, 2026

Uh oh!

copy-pr-bot bot commented Jan 29, 2026

Uh oh!

Fridah-nv commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Fridah-nv commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

realAsma commented Jan 28, 2026

Uh oh!

copy-pr-bot bot commented Jan 29, 2026

Uh oh!

Fridah-nv commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fridah-nv commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

codecov bot commented Jan 28, 2026 •

edited

Loading