Bug fix 6012573 by sugunav14 · Pull Request #1130 · NVIDIA/Model-Optimizer

sugunav14 · 2026-03-28T06:03:19Z

What does this PR do?

Type of change: Bug fix

torch_dtype is deprecated and must be changed in the config yaml file to be detected. In addition to this, set the default value to bfloat16 instead of float32 as TRTLLM does not support float32 deployment.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Improvements
- Refactored GPTQ quantization algorithm with enhanced sequential Hessian-based weight updates and improved blockwise precision handling.
Configuration
- Renamed GPTQ algorithm configuration from gptq_lite to gptq with simplified parameter requirements.
- Updated default model precision to bfloat16.
Tests
- Added quantization roundtrip verification tests for export and dequantization workflows.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

…ntizer, NVFP4MSECalibrator (#849) **Type of change:** ?  **Overview:** ?  ```python ```   - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No    * **New Features** * Added NVFP4StaticQuantizer for improved 4-bit quantization with enhanced precision control * Introduced NVFP4MSECalibrator with flexible candidate generation for calibration optimization * **Improvements** * Optimized GPU kernels for Hopper+ graphics cards with better performance * Extended Triton support to broader GPU compatibility * Enhanced backward compatibility for restoring previously quantized models * **Tests** * Added comprehensive test coverage for new quantizers and calibration methods  --------- Signed-off-by: realAsma <akuriparambi@nvidia.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

…FP4QTensor Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

coderabbitai · 2026-03-28T06:03:33Z

📝 Walkthrough

Walkthrough

This PR refactors the GPTQ quantization algorithm by renaming the prior gptq_lite API to gptq, updating configuration classes, mode descriptors, and implementation logic. Changes include removing the hessian_state_path field, updating model precision defaults, and replacing the calibration flow with a new sequential Hessian-based approach using forward patching and per-module helper classes.

Changes

Cohort / File(s)	Summary
Configuration & API Refactoring `modelopt/torch/quantization/config.py`, `modelopt/torch/quantization/mode.py`	Renamed `GPTQLiteConfig` to `GPTQConfig` with method field changed from `"gptq_lite"` to `"gptq"`; made `percdamp` and `block_size` non-optional with defaults; removed `hessian_state_path` field. Updated corresponding mode descriptor class and imports.
Core GPTQ Implementation `modelopt/torch/quantization/model_calib.py`	Replaced `gptq_lite` implementation with new `gptq` function using `GPTQHelper` class for Hessian management, forward-method patching instead of global state, and column-wise weight updates. Updated `update_hessian` to flatten inputs as token counts and added `_print_mse_error` logging.
Utilities & Configuration `modelopt/torch/quantization/utils/core_utils.py`, `modelopt/torch/quantization/model_quant.py`	Added `disabled_weight_quantizers` context manager for temporary weight quantizer disabling during calibration. Removed manual `gptq_lite` algorithm override in example flow.
Examples & Configuration `examples/gpt-oss/configs/sft_full.yaml`, `examples/gpt-oss/sft.py`	Updated config key from `torch_dtype` to `dtype` and changed `report_to` from `[trackio]` to `[none]`. Changed default `torch_dtype` fallback from `"float32"` to `"bfloat16"`.
Tests `tests/gpu/torch/quantization/test_gptq.py`	Refactored `test_gptq_updates` to use new `gptq` API directly instead of manual Hessian construction; updated `test_update_hessian` assertions for flattened token counting; added `test_gptq_export_roundtrip` for NVFP4 weight export/dequantization verification; updated `test_gptq_e2e_flow` configuration.

Sequence Diagram(s)

sequenceDiagram
    participant Model as Model
    participant SeqCal as Sequential Calibrate
    participant Helper as GPTQHelper
    participant HessianPatch as Forward Patch
    participant WeightUpdate as Weight Updater
    
    SeqCal->>SeqCal: For each layer
    SeqCal->>Helper: Create GPTQHelper instance
    Helper->>HessianPatch: Patch layer forward
    
    SeqCal->>Model: Forward pass (collect Hessians)
    Model->>HessianPatch: Forward called
    HessianPatch->>Helper: Store activation tensors
    Helper->>Helper: Accumulate Hessian
    
    SeqCal->>Helper: Prepare inverse Hessian
    Helper->>Helper: Handle dead neurons + damping
    
    Helper->>WeightUpdate: For each weight block
    WeightUpdate->>WeightUpdate: Compute GPTQ delta
    WeightUpdate->>WeightUpdate: Quantize & propagate error
    WeightUpdate->>Model: Update weight
    
    Helper->>HessianPatch: Unpatch forward
    SeqCal->>SeqCal: Next layer

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

GPTQ Lite implementation #555: Directly related as this PR refactors and renames the earlier "gptq_lite" implementation from #555 into the new "gptq" API, removing deprecated fields and updating the calibration logic.

Suggested reviewers

kaix-nv
RalphMao

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	The test file contains hardcoded trust_remote_code=True when loading a HuggingFace tokenizer, which should be configurable and default to False.	Refactor to make trust_remote_code a configurable parameter defaulting to False, only enabling it when explicitly needed for specific test scenarios.
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Bug fix 6012573' is too vague and does not describe the actual changes; it only references a bug number without explaining what was fixed or changed.	Replace with a descriptive title that explains the main change, such as 'Update GPTQ implementation and change default dtype to bfloat16' or 'Migrate from gptq_lite to gptq and update default precision'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch svelury/bug-fix-6012573

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-28T06:08:12Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-03-28 18:42 UTC

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

examples/gpt-oss/configs/sft_full.yaml (1)
25-25: Unrelated change: disabling external reporting.

The change from report_to: [trackio] to report_to: [none] disables external tracking/reporting. This appears unrelated to the main PR objective of fixing the deprecated torch_dtype parameter. If this change was intentional, consider documenting the reason in the PR description or commit message.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/gpt-oss/configs/sft_full.yaml` at line 25, The PR unintentionally
changed the logging/telemetry setting by replacing the config key value
report_to: [trackio] with report_to: [none]; revert this unrelated change
(restore report_to: [trackio]) unless telemetry was intentionally disabled—if
intentional, add a short note to the PR description or commit message explaining
why external reporting was turned off and reference the report_to setting so
reviewers know it was deliberate.
modelopt/torch/quantization/model_calib.py (1)
1795-1809: Consider memory optimization for large weight matrices.

Line 1795 clones the entire weight for each block iteration (wblk = self.weight.clone()), and line 1801 quantizes the full matrix for each column. While this matches standard GPTQ, it can be memory-intensive for large models.

The current implementation is functionally correct. If memory becomes a concern for very large layers, consider lazy quantization or partial matrix operations in future iterations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1795 - 1809, The
code clones and quantizes the entire self.weight inside the block loop (wblk =
self.weight.clone(); qdq = quantizer(wblk)), which is memory-heavy; instead,
operate on slices: clone only the block slice (e.g., wblk = self.weight[:,
block_start:block_end].clone()), call quantizer on that slice, compute qdq and
err for the local columns using h_inv_cho_blk and errs, and update self.weight
only for the affected columns and the trailing addmm_ using
h_inv[block_start:block_end, block_end:]—keep the same variable names (weight,
wblk, quantizer, h_inv_cho_blk, errs, block_start, block_end, n_cols_blk) so
logic is identical but memory use is reduced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/gpt-oss/sft.py`:
- Line 75: The config key mismatch causes model_args.dtype to remain unset
because examples/gpt-oss/configs/sft_lora.yaml uses torch_dtype instead of
dtype; update that YAML to use the key name expected by the code. Edit
examples/gpt-oss/configs/sft_lora.yaml and change the entry torch_dtype:
bfloat16 to dtype: bfloat16 (matching sft_full.yaml) so the value is picked up
by the code path that reads model_args.dtype in sft.py.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1532-1547: The code currently assumes non-empty input and can
divide by zero when batch_size==0; add an early guard in the function that
computes/updates the hessian (the block using input_flat, batch_size, hessian,
n_samples) that, if batch_size is 0 (i.e., input.reshape yields zero tokens),
immediately returns the unchanged hessian and n_samples; place this check after
computing input_flat and batch_size and before adjusting hessian/n_samples and
the scaled_input calculations to avoid the division by zero and unnecessary
work.
- Around line 1835-1850: Wrap the Hessian computation and weight-update phases
in try/finally blocks to ensure patched forward methods are always cleaned up
and resources freed: after creating gptq_handles and calling handle.setup(), run
forward_loop(model) inside a try block and call each handle.cleanup() in the
finally; similarly, when running the update_weights loop, ensure any exception
still triggers handle.free() (and optionally cleanup) in a finally. Reference
GPTQHelper.setup, GPTQHelper.cleanup, GPTQHelper.update_weights,
GPTQHelper.free, gptq_handles, disabled_weight_quantizers, and forward_loop to
locate and modify the code.

---

Nitpick comments:
In `@examples/gpt-oss/configs/sft_full.yaml`:
- Line 25: The PR unintentionally changed the logging/telemetry setting by
replacing the config key value report_to: [trackio] with report_to: [none];
revert this unrelated change (restore report_to: [trackio]) unless telemetry was
intentionally disabled—if intentional, add a short note to the PR description or
commit message explaining why external reporting was turned off and reference
the report_to setting so reviewers know it was deliberate.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1795-1809: The code clones and quantizes the entire self.weight
inside the block loop (wblk = self.weight.clone(); qdq = quantizer(wblk)), which
is memory-heavy; instead, operate on slices: clone only the block slice (e.g.,
wblk = self.weight[:, block_start:block_end].clone()), call quantizer on that
slice, compute qdq and err for the local columns using h_inv_cho_blk and errs,
and update self.weight only for the affected columns and the trailing addmm_
using h_inv[block_start:block_end, block_end:]—keep the same variable names
(weight, wblk, quantizer, h_inv_cho_blk, errs, block_start, block_end,
n_cols_blk) so logic is identical but memory use is reduced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1724464-f96a-4dbd-b311-4f2cec4b6a06

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5fd65 and 86557bb.

📒 Files selected for processing (8)

examples/gpt-oss/configs/sft_full.yaml
examples/gpt-oss/sft.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/quantization/utils/core_utils.py
tests/gpu/torch/quantization/test_gptq.py

💤 Files with no reviewable changes (1)

modelopt/torch/quantization/model_quant.py

coderabbitai · 2026-03-28T06:10:18Z

examples/gpt-oss/sft.py

        "trust_remote_code": model_args.trust_remote_code,
        "attn_implementation": model_args.attn_implementation,
-        "torch_dtype": getattr(model_args, "dtype", "float32"),
+        "torch_dtype": getattr(model_args, "dtype", "bfloat16"),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Find all YAML config files that still use the deprecated torch_dtype key # Search for torch_dtype in YAML config files rg -n 'torch_dtype:' --glob 'examples/gpt-oss/configs/*.yaml'

Repository: NVIDIA/Model-Optimizer

Length of output: 128

🏁 Script executed:

# Check the content of sft_lora.yaml and sft_full.yaml to see the full picture find examples/gpt-oss/configs -type f -name "*.yaml" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 179

🏁 Script executed:

# Get the first 10 lines of the YAML config files to see structure for file in $(find examples/gpt-oss/configs -type f -name "*.yaml"); do echo "=== $file ===" head -10 "$file" done

Repository: NVIDIA/Model-Optimizer

Length of output: 884

🏁 Script executed:

# Search for all occurrences of "dtype" (both torch_dtype and dtype) in YAML files rg -n '(torch_)?dtype:' --glob 'examples/gpt-oss/configs/*.yaml'

Repository: NVIDIA/Model-Optimizer

Length of output: 185

Update examples/gpt-oss/configs/sft_lora.yaml to use the correct config key.

The Python code at line 75 reads model_args.dtype (with fallback to "bfloat16"). However, examples/gpt-oss/configs/sft_lora.yaml provides torch_dtype: bfloat16 instead of dtype: bfloat16. This means the config value is silently ignored and the hardcoded default is used instead.

Change line 4 in sft_lora.yaml from torch_dtype: bfloat16 to dtype: bfloat16 to match sft_full.yaml and ensure the configuration is properly applied.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/gpt-oss/sft.py` at line 75, The config key mismatch causes model_args.dtype to remain unset because examples/gpt-oss/configs/sft_lora.yaml uses torch_dtype instead of dtype; update that YAML to use the key name expected by the code. Edit examples/gpt-oss/configs/sft_lora.yaml and change the entry torch_dtype: bfloat16 to dtype: bfloat16 (matching sft_full.yaml) so the value is picked up by the code path that reads model_args.dtype in sft.py.

coderabbitai · 2026-03-28T06:10:18Z

modelopt/torch/quantization/model_calib.py

+
+    Note: input must be non-empty (batch_size > 0); a zero-sized input causes division by zero.
    """
-    batch_size = input.shape[0]
+    # Flatten to 2D (total_tokens, features) first, so batch_size counts tokens
+    input_flat = input.reshape(-1, input.shape[-1]).t().float()
+    batch_size = input_flat.shape[1]

    # Incremental averaging: scale down old hessian
    hessian *= n_samples / (n_samples + batch_size)
    n_samples += batch_size

    # Compute outer product: H += (2/n_samples) * X @ X^T
-    # where X is the flattened input reshaped to (features, batch*seq)
-    input_flat = input.reshape(-1, input.shape[-1]).t().float()
    scaled_input = math.sqrt(2 / n_samples) * input_flat
    hessian.add_((scaled_input @ scaled_input.t()).to(hessian.device))

    return hessian, n_samples


⚠️ Potential issue | 🟡 Minor

Handle zero-sized inputs to prevent division by zero.

The comment at line 1533 documents the precondition but doesn't enforce it. If input has zero elements along the batch/sequence dimensions, batch_size will be 0, causing division by zero at line 1540.

Consider adding an early return:

+ if input.numel() == 0: + return hessian, n_samples # Flatten to 2D (total_tokens, features) first, so batch_size counts tokens input_flat = input.reshape(-1, input.shape[-1]).t().float() batch_size = input_flat.shape[1]

This is especially relevant for MoE models where some experts may receive no tokens during calibration.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Note: input must be non-empty (batch_size > 0); a zero-sized input causes division by zero.

"""

batch_size = input.shape[0]

# Flatten to 2D (total_tokens, features) first, so batch_size counts tokens

input_flat = input.reshape(-1, input.shape[-1]).t().float()

batch_size = input_flat.shape[1]

# Incremental averaging: scale down old hessian

hessian *= n_samples / (n_samples + batch_size)

n_samples += batch_size

# Compute outer product: H += (2/n_samples) * X @ X^T

# where X is the flattened input reshaped to (features, batch*seq)

input_flat = input.reshape(-1, input.shape[-1]).t().float()

scaled_input = math.sqrt(2 / n_samples) * input_flat

hessian.add_((scaled_input @ scaled_input.t()).to(hessian.device))

return hessian, n_samples

Note: input must be non-empty (batch_size > 0); a zero-sized input causes division by zero.

"""

if input.numel() == 0:

return hessian, n_samples

# Flatten to 2D (total_tokens, features) first, so batch_size counts tokens

input_flat = input.reshape(-1, input.shape[-1]).t().float()

batch_size = input_flat.shape[1]

# Incremental averaging: scale down old hessian

hessian *= n_samples / (n_samples + batch_size)

n_samples += batch_size

# Compute outer product: H += (2/n_samples) * X @ X^T

scaled_input = math.sqrt(2 / n_samples) * input_flat

hessian.add_((scaled_input @ scaled_input.t()).to(hessian.device))

return hessian, n_samples

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_calib.py` around lines 1532 - 1547, The code currently assumes non-empty input and can divide by zero when batch_size==0; add an early guard in the function that computes/updates the hessian (the block using input_flat, batch_size, hessian, n_samples) that, if batch_size is 0 (i.e., input.reshape yields zero tokens), immediately returns the unchanged hessian and n_samples; place this check after computing input_flat and batch_size and before adjusting hessian/n_samples and the scaled_input calculations to avoid the division by zero and unnecessary work.

coderabbitai · 2026-03-28T06:10:19Z

modelopt/torch/quantization/model_calib.py

+    gptq_handles = {name: GPTQHelper(m, name, offload_to_cpu=True) for name, m in quantized_layers}
+    for handle in gptq_handles.values():
+        handle.setup()

-            calib_func(layer, _layer_forward_loop, **calib_kwargs)
+    print_rank_0(f"Computing Hessians for {len(gptq_handles)} linear layers...")

-            del layer_inputs
-            torch.cuda.empty_cache()
-    finally:
-        input_getter._unpatch_all_layers()
+    with disabled_weight_quantizers(model):
+        forward_loop(model)
+
+    for handle in gptq_handles.values():
+        handle.cleanup()
+
+    print_rank_0("Updating weights using GPTQ algorithm...")
+    for handle in gptq_handles.values():
+        handle.update_weights(block_size, percdamp)
+        handle.free()


⚠️ Potential issue | 🟡 Minor

Add exception safety for forward method cleanup.

If an exception occurs during Hessian collection (lines 1841-1842) or weight updates (lines 1848-1850), the patched forward methods won't be cleaned up, leaving the model in an inconsistent state.

🛡️ Suggested fix with try-finally

gptq_handles = {name: GPTQHelper(m, name, offload_to_cpu=True) for name, m in quantized_layers} for handle in gptq_handles.values(): handle.setup() print_rank_0(f"Computing Hessians for {len(gptq_handles)} linear layers...") - with disabled_weight_quantizers(model): - forward_loop(model) - - for handle in gptq_handles.values(): - handle.cleanup() + try: + with disabled_weight_quantizers(model): + forward_loop(model) + finally: + for handle in gptq_handles.values(): + handle.cleanup() print_rank_0("Updating weights using GPTQ algorithm...")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_calib.py` around lines 1835 - 1850, Wrap the Hessian computation and weight-update phases in try/finally blocks to ensure patched forward methods are always cleaned up and resources freed: after creating gptq_handles and calling handle.setup(), run forward_loop(model) inside a try block and call each handle.cleanup() in the finally; similarly, when running the update_weights loop, ensure any exception still triggers handle.free() (and optionally cleanup) in a finally. Reference GPTQHelper.setup, GPTQHelper.cleanup, GPTQHelper.update_weights, GPTQHelper.free, gptq_handles, disabled_weight_quantizers, and forward_loop to locate and modify the code.

codecov · 2026-03-28T06:16:11Z

Codecov Report

❌ Patch coverage is 20.49689% with 128 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.15%. Comparing base (ae965a9) to head (86557bb).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	16.90%	118 Missing ⚠️
modelopt/torch/quantization/utils/core_utils.py	18.18%	9 Missing ⚠️
modelopt/torch/quantization/mode.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1130      +/-   ##
==========================================
- Coverage   70.18%   70.15%   -0.04%     
==========================================
  Files         228      230       +2     
  Lines       25952    26053     +101     
==========================================
+ Hits        18215    18278      +63     
- Misses       7737     7775      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fridah-nv and others added 30 commits March 25, 2026 17:51

add rabbit feedback

1fee97c

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

minor

3f717dd

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

tested perplexity

971b168

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested, revert later

10c16ca

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested

364fd78

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

refactor

5aee517

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

address reviewers feedback, delegate scaling factor calculation to NV…

6a15d0d

…FP4QTensor Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested perplexity

7b7146b

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested exported checkpoints on 0211

40c14ef

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested nano v3

7a1e006

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added activation MSE logging

e6df379

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

super v3 run

b81fed8

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added activationmse logging helper

f3a9524

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

input amax sync added + tested gptq super sft checkpoint

22e2b95

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

checkpoints generated on 0223

10d21ba

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested perplexity

188fa1d

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested, revert later

599227e

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested

60df0d8

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

initial cleanup

f88ba6e

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

cleanup

7b24cd3

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

removed stray prints

b17b917

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

fix rebase issues

8ff8976

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

minor

5815ce8

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested e2e on qwen

b1f1434

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

removed perplexity eval

df6b182

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

update

75a08fe

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

revert later

9e58a6f

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

minor update

16086c7

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

update

9b47e77

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 added 11 commits March 25, 2026 18:09

gptq faster

4ec2433

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added metrics files, remove later

2b0af3d

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

claude review

ee40b48

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

remove stray files

a175178

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

refactor

a948497

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

claude review + coderabbit review

7e235b4

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

refactor

d1498be

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

stray changes removed

d8b1d93

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

Address PR comments

19fc0c2

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

fixed circular import issue

068e8a9

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

set default to bfloat16

86557bb

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 requested review from a team as code owners March 28, 2026 06:03

sugunav14 requested review from kevalmorabia97 and mxinO and removed request for mxinO March 28, 2026 06:03

sugunav14 requested a review from realAsma March 28, 2026 06:03

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

sugunav14 closed this Mar 28, 2026

sugunav14 deleted the svelury/bug-fix-6012573 branch March 28, 2026 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix 6012573#1130

Bug fix 6012573#1130
sugunav14 wants to merge 41 commits intomainfrom
svelury/bug-fix-6012573

sugunav14 commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks failed

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 28, 2026

Uh oh!

coderabbitai bot Mar 28, 2026

Uh oh!

coderabbitai bot Mar 28, 2026

Uh oh!

codecov bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sugunav14 commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 28, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sugunav14 commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading