Skip to content

Add Agent PTQ skill for model quantization#1107

Open
mxinO wants to merge 17 commits intomainfrom
mxin/agent-ptq
Open

Add Agent PTQ skill for model quantization#1107
mxinO wants to merge 17 commits intomainfrom
mxin/agent-ptq

Conversation

@mxinO
Copy link
Copy Markdown
Contributor

@mxinO mxinO commented Mar 24, 2026

What does this PR do?

Type of change: New feature

Add a PTQ skill that supports
Non-launcher path:

  1. Supported or unsupported transfomers models, if supported, will run ptq script, in unsupported (needs source code), will patch modules and write custom script.
  2. Remote execution support, config your cluster in clusters.yaml (or give info to the agent directly), then say "quantize Qwen3-8B in xxx". The agent will use SSH ControlMaster to keep a persistent ssh session, so the remote server won't be abused and connection overhead can be reduced significantly.
  3. Slurm support, will automatically detect if the current machine (or a remote machine) is a slurm login node, and will write slurm script for PTQ if so.
  4. It can also dequantize fp8 to bf16 if a fp8 model is given like deepseek.

launcher path:
We currently only use launcher for (supported models + non bare metal) case.

What are not supported now:

  1. Models not in transformers

Usage

  1. Quantize Qwen3-0.6B to NVFP4
  2. Quantize Qwen3-0.6B using cluster xxx

Testing

Test Path Result
Qwen3-0.6B (listed) 4B — Launcher + SLURM Checkpoint on remote cluster
SmolLM-135M (unlisted) 4C → hf_ptq.py works 112M checkpoint

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ❌
    Will add tests later.
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guides for remote GPU cluster execution via SLURM or Docker environments, including configuration, SSH setup, and troubleshooting.
    • Added post-training quantization (PTQ) workflow documentation with model support matrix, format selection, and execution strategies.
    • Added workspace management guide for organizing model artifacts and outputs across quantization tasks.
    • Added SLURM job submission patterns and container runtime guidance for remote PTQ execution.
  • Chores

    • Added example configuration file for remote cluster settings.
    • Added utility script for remote execution operations.
    • Added test specifications for PTQ skill validation.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1107/

Built to branch gh-pages at 2026-04-01 08:00 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

The PR introduces comprehensive remote execution infrastructure and post-training quantization (PTQ) workflows for ModelOpt. It adds a cluster configuration schema, shared environment detection and remote execution utilities written in Bash, detailed documentation for setting up local and remote environments, workspace management patterns, SLURM job handling, and a complete PTQ skill definition with test specifications and reference guides for launcher usage and unsupported model handling.

Changes

Cohort / File(s) Summary
Configuration & Cluster Setup
.claude/clusters.yaml.example
Introduces cluster configuration schema defining connection details (login node, user, SSH key), workspace paths, GPU type hints, and optional SLURM defaults with default_cluster directive.
Remote Execution Infrastructure
.claude/skills/common/remote-execution.md, .claude/skills/common/remote_exec.sh
Adds remote execution framework including documentation of SSH/SLURM/Docker workflows, persistent session management, base64-encoded command execution with retry logic, rsync-based file sync, and SLURM job submission/monitoring utilities.
Common Shared Documentation
.claude/skills/common/environment-setup.md, .claude/skills/common/slurm-setup.md, .claude/skills/common/workspace-management.md
Provides standardized guides for environment detection (local vs. remote, SLURM vs. Docker vs. bare-metal), generic SLURM job patterns, and workspace directory organization by model name with multi-user support.
PTQ Skill Definition
.claude/skills/ptq/SKILL.md, .claude/skills/ptq/tests.json
Defines end-to-end PTQ workflow including environment setup, model verification, format selection, three execution paths (manual, launcher-based, unlisted-model handling), and assessment cases with behavioral expectations.
PTQ Reference Guides
.claude/skills/ptq/references/launcher-guide.md, .claude/skills/ptq/references/unsupported-models.md, .claude/skills/ptq/references/slurm-setup-ptq.md
Provides launcher integration instructions, comprehensive patterns for quantizing unsupported models (including checkpoint inspection, FP8 detection, custom module patching), and PTQ-specific SLURM container/GPU sizing guidance.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Add Agent PTQ skill for model quantization' clearly and concisely summarizes the main change: introducing a new Agent PTQ skill. It is directly related to the substantial changeset of new documentation, configuration examples, and test specifications for post-training quantization.
Security Anti-Patterns ✅ Passed PR adds only non-Python files to .claude directory (YAML, Markdown, JSON, Bash scripts). No Python code modifications in modelopt package or examples.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mxin/agent-ptq

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.18%. Comparing base (b1f9f01) to head (d8f1764).
⚠️ Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1107      +/-   ##
==========================================
+ Coverage   70.16%   70.18%   +0.02%     
==========================================
  Files         229      230       +1     
  Lines       26008    26080      +72     
==========================================
+ Hits        18248    18304      +56     
- Misses       7760     7776      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Claude skill (ptq) to guide/execute NVIDIA ModelOpt post-training quantization workflows, including remote execution and SLURM-oriented operational guidance.

Changes:

  • Introduces the ptq skill playbook and reference guides for unsupported models, SLURM setup, and remote execution.
  • Adds a reusable remote_exec.sh library for SSH ControlMaster sessions, remote command execution, rsync syncing, and SLURM helpers.
  • Documents multi-user workspace management and provides a sample clusters.yaml configuration.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.claude/skills/ptq/SKILL.md PTQ decision flow and key rules for supported vs unsupported models, plus environment selection (local/remote/SLURM).
.claude/skills/ptq/references/unsupported-models.md Patterns and investigation steps for quantizing unsupported architectures.
.claude/skills/ptq/references/slurm-setup.md SLURM container/job-script templates and operational guidance (smoke tests, monitoring).
.claude/skills/ptq/references/remote-execution.md Remote execution workflow and cluster configuration guidance.
.claude/skills/common/remote_exec.sh Bash utility library for persistent SSH, remote run/sync, environment detection, and SLURM helpers.
.claude/skills/common/workspace-management.md Workspace creation/reuse guidance for multi-user (Slack bot) environments.
.claude/clusters.yaml.example Example cluster configuration template.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mxinO added 5 commits March 25, 2026 06:11
Add a Claude Code skill that guides the agent through post-training
quantization (PTQ) of LLMs/VLMs using ModelOpt. Includes:

- PTQ skill with decision tree for supported/unsupported models
- Launcher integration for SLURM/Docker execution
- Common environment setup (shared across future skills)
- Remote execution and SLURM setup references (manual fallback)
- Unsupported model monkey-patching guide
- Workspace management for multi-user environments

Signed-off-by: Meng Xin <mxin@nvidia.com>
Workspace-by-model-name is useful for single users too, not just
multi-user Slack bot. Rewrite to cover both: ./workspaces/ for
single-user, $MODELOPT_WORKSPACE_ROOT for multi-user.

Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
mxinO added 4 commits March 25, 2026 08:55
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Copy link
Copy Markdown
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mxinO for refactoring the ptq skills based on our discussions, and I think the PR is in good shape.

  • Let's consider adding a tests.json with some basic cases (e.g., supported model on SLURM, unsupported VLM, format selection on Hopper vs Blackwell, smoke test failure handling)
  • We can also add negative triggers to the frontmatter description, especially when deployment and evaluation skills are added. cc @kaix-nv

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can consider adding a references/supported-models.md as well. Currently it's inline in SKILL.md, and the hf_ptq.py workflow and flag reference is just a few lines in SKILL.md rather than a reference. Acceptable for now since it's short, but if it grows, we can extract it.


# ModelOpt Post-Training Quantization

Produce a quantized checkpoint from a pretrained model. **Read `examples/llm_ptq/README.md` first** — it has the support matrix, CLI flags, and accuracy guidance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works but means the agent loads the entire README rather than a focused GPU → format matrix. Consider extracting later, e.g., adding a references/format-selection.md.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to refer to readme, otherwise after each update of new models, we need update the skill. we can talk about this later, but currently let's keep it referring the readme.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG - it will be better to point to a specific section if we know what we want claude to read specifically.

mxinO added 5 commits March 26, 2026 03:58
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO mxinO marked this pull request as ready for review March 26, 2026 08:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
.claude/skills/common/remote-execution.md (1)

112-112: Use sbatch --parsable for deterministic job ID extraction.

The grep-based approach can misparse output if SLURM includes multiple numbers. Use --parsable with cut -d';' -f1 for reliable, machine-readable extraction that handles both single-cluster and federated SLURM environments.

Proposed doc fix
-JOBID=$(remote_run "sbatch /remote/path/scripts/job_slurm.sh" | grep -o '[0-9]\+' | tail -1)
+JOBID=$(remote_run "sbatch --parsable /remote/path/scripts/job_slurm.sh" | cut -d';' -f1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/remote-execution.md at line 112, The current JOBID
extraction using remote_run "sbatch /remote/path/scripts/job_slurm.sh" | grep
... is brittle; change the sbatch invocation to use --parsable and extract the
first field to get a deterministic job id (e.g., run remote_run "sbatch
--parsable /remote/path/scripts/job_slurm.sh" and pipe to cut -d';' -f1) so
JOBID reliably contains only the numeric job id even in federated SLURM setups;
update the assignment of JOBID and any references to JOBID accordingly.
.claude/skills/common/slurm-setup.md (1)

58-59: Use sbatch --parsable for stable job ID capture.

Parsing the 4th whitespace token is brittle and depends on human-readable output format, which can break with localized SLURM installations or site-specific wrappers. The --parsable flag is the SLURM-recommended approach for scripting, outputting structured format (job ID or job ID;cluster name).

Proposed doc fix
-JOBID=$(sbatch <script>.sh | awk '{print $4}')
+JOBID=$(sbatch --parsable <script>.sh | cut -d';' -f1)
 echo "Submitted job $JOBID"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/slurm-setup.md around lines 58 - 59, The job
submission snippet uses brittle whitespace parsing to extract JOBID; replace the
sbatch invocation with the parsable mode and assign its output directly to JOBID
(e.g., use sbatch --parsable) and, if you need only the numeric ID, split on ';'
or take the part before any cluster suffix; update the lines that set and echo
the JOBID variable (JOBID=$(sbatch ...) and echo "Submitted job $JOBID") to use
this robust approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/common/environment-setup.md:
- Around line 18-31: The doc assumes .claude is in $PWD and mis-specifies SLURM
detection; update the recipe to instruct callers to resolve the
repository/project root (the same ancestor-walk used by your implementation)
before referencing .claude/clusters.yaml and invoking remote_load_cluster and
remote_check_ssh, and clarify that remote_detect_env should detect SLURM by
checking for sbatch (not both srun and sbatch) so PATH/partial-login-node cases
don't pick the wrong branch; also state that remote_load_cluster must be passed
the resolved cluster config path (not a relative .claude path) so subdirectory
invocation behaves identically to the implementation.

In @.claude/skills/common/remote_exec.sh:
- Around line 217-245: The code currently exports REMOTE_WORKSPACE without
validating it, which can result in accidental root (/) or empty paths used by
remote_sync_to/remote_sync_from/remote_ensure_workspace/remote_docker_run; after
parsing REMOTE_WORKSPACE (the REMOTE_WORKSPACE assignment) add a validation: if
REMOTE_WORKSPACE is empty or equals "/" (or otherwise looks invalid) then fail
fast (echo a clear error referencing the cluster name and return 1) so the
config is rejected rather than exported; ensure this validation happens before
the export of REMOTE_WORKSPACE and before printing "Loaded cluster..." so
consumers like remote_sync_to, remote_sync_from, remote_ensure_workspace and
remote_docker_run never receive an invalid workspace.
- Around line 66-82: The _parse_yaml_value function currently injects the
dot.path into an inlined Python string (keys = '$path'), which enables code
injection; change it to accept the file and path via Python argv (use sys.argv
for file and lookup keys) and print the resolved scalar, avoiding any string
interpolation of YAML inputs. Likewise remove uses of eval that build commands
from cluster_name, REMOTE_HOST, REMOTE_SSH_KEY and sync path variables: replace
all eval-based command construction with explicit arrays/argument lists for ssh
and rsync (construct subprocess/exec argument arrays or shell-safe arrays in
bash) and pass variables as quoted argv elements rather than interpolating them
into command strings so no untrusted config can execute locally. Ensure
references to the functions/places using eval (the sections building SSH
connection strings and rsync invocations) are updated to use safe argument
arrays and direct variable passing.

In @.claude/skills/common/workspace-management.md:
- Line 45: Replace the awkward phrase "Task needs output from a previous step
(e.g., eval needs the PTQ checkpoint)" with clearer wording: "Task requires
output from a previous step (e.g., eval requires the PTQ checkpoint)"; update
the string in the document where that exact text appears (search for "Task needs
output from a previous step") so both instances in the example use "requires"
for readability and consistency.

In @.claude/skills/ptq/references/launcher-guide.md:
- Around line 46-50: Clarify that the find command targeting
tools/launcher/local_experiments and matching */exported_model/*/config.json is
specific to local Docker runs (local-launcher) by adding a short note that this
lookup only applies to local Docker mode and to scope the host-path lookup
accordingly; also add a separate brief note describing where artifacts live for
remote SLURM runs (e.g., central experiment/artifact storage or node-specific
scratch paths) and suggest the user consult SLURM job logs or the remote
experiment directory rather than the local tools/launcher/local_experiments
path.

In @.claude/skills/ptq/SKILL.md:
- Line 12: The reference paths in the sentence "Read
`skills/common/environment-setup.md` and
`skills/common/workspace-management.md`." should be normalized to match
companion docs by prefixing with ".claude/"; update those two references to
".claude/skills/common/environment-setup.md" and
".claude/skills/common/workspace-management.md" (and scan the rest of
.claude/skills/ptq/SKILL.md for any other `skills/...` references and change
them to the ".claude/skills/..." style to keep consistency).

---

Nitpick comments:
In @.claude/skills/common/remote-execution.md:
- Line 112: The current JOBID extraction using remote_run "sbatch
/remote/path/scripts/job_slurm.sh" | grep ... is brittle; change the sbatch
invocation to use --parsable and extract the first field to get a deterministic
job id (e.g., run remote_run "sbatch --parsable
/remote/path/scripts/job_slurm.sh" and pipe to cut -d';' -f1) so JOBID reliably
contains only the numeric job id even in federated SLURM setups; update the
assignment of JOBID and any references to JOBID accordingly.

In @.claude/skills/common/slurm-setup.md:
- Around line 58-59: The job submission snippet uses brittle whitespace parsing
to extract JOBID; replace the sbatch invocation with the parsable mode and
assign its output directly to JOBID (e.g., use sbatch --parsable) and, if you
need only the numeric ID, split on ';' or take the part before any cluster
suffix; update the lines that set and echo the JOBID variable (JOBID=$(sbatch
...) and echo "Submitted job $JOBID") to use this robust approach.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 60724936-1b1c-4a85-a31d-a30327a23eed

📥 Commits

Reviewing files that changed from the base of the PR and between b1f9f01 and 87988e8.

📒 Files selected for processing (10)
  • .claude/clusters.yaml.example
  • .claude/skills/common/environment-setup.md
  • .claude/skills/common/remote-execution.md
  • .claude/skills/common/remote_exec.sh
  • .claude/skills/common/slurm-setup.md
  • .claude/skills/common/workspace-management.md
  • .claude/skills/ptq/SKILL.md
  • .claude/skills/ptq/references/launcher-guide.md
  • .claude/skills/ptq/references/slurm-setup.md
  • .claude/skills/ptq/references/unsupported-models.md

Copy link
Copy Markdown
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall 👍 Left minor comments in lines.

Additional comments:

  • We can add a tests.json for the ptq skills in a follow-up PR.
  • I would consider further extracting the execution and monitoring parts into a separate run-and-monitor skills when we land deployment and evaluation skills to avoid duplicating logics.

@Edwardf0t1 Edwardf0t1 requested review from jingyu-ml and juhim98 March 30, 2026 18:45
mxinO added 2 commits April 1, 2026 07:09
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.claude/skills/common/workspace-management.md (1)

45-45: ⚠️ Potential issue | 🟡 Minor

Use clearer wording for the reuse criterion.

Line 45 reads awkwardly; use “Task requires output from a previous step (e.g., eval requires the PTQ checkpoint)” for clarity and consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/workspace-management.md at line 45, Replace the
awkward phrase "Task needs output from a previous step (e.g., eval needs the PTQ
checkpoint)" with the clearer wording "Task requires output from a previous step
(e.g., eval requires the PTQ checkpoint)" in the workspace-management text; find
the sentence that mentions "Task needs output" and update both occurrences of
"needs" to "requires" to ensure consistency and clearer phrasing.
🧹 Nitpick comments (2)
.claude/skills/ptq/references/unsupported-models.md (2)

78-84: Make the parameter-name inspection snippet robust for non-sharded checkpoints.

The current snippet assumes model.safetensors.index.json always exists. That breaks for single-file checkpoints.

Proposed doc snippet update
-import json
-idx = json.load(open('<ckpt_path>/model.safetensors.index.json'))
-import re
-names = set(re.sub(r'\.\d+\.', '.N.', k) for k in idx['weight_map'])
-for n in sorted(names): print(n)
+import json, re, os
+from safetensors import safe_open
+
+index_path = "<ckpt_path>/model.safetensors.index.json"
+names = set()
+if os.path.exists(index_path):
+    idx = json.load(open(index_path))
+    names = set(idx["weight_map"].keys())
+else:
+    # fallback for single-file safetensors
+    with safe_open("<ckpt_path>/model.safetensors", framework="pt") as f:
+        names = set(f.keys())
+
+normalized = set(re.sub(r"\.\d+\.", ".N.", k) for k in names)
+for n in sorted(normalized):
+    print(n)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md around lines 78 - 84,
The snippet assumes model.safetensors.index.json always exists; change it to
first check for that file (use os.path.exists) and if present load idx =
json.load(... ) and use idx['weight_map'], otherwise open the single-file
checkpoint and extract parameter names (e.g., via safetensors.safe_open or
safetensors.torch.load_file to list keys) and then apply the same regex
normalization (re.sub(r'\.\d+\.', '.N.', k)) to produce names; update variable
usage (idx, names) so both branches yield the same set for printing.

9-9: Fix the internal doc path to avoid a broken reference.

Line 9 points to skills/common/workspace-management.md, but this doc tree is under .claude/skills/.... Please align the path format used elsewhere in this PR so this reference resolves reliably.

Proposed doc fix
-**Download first.** Follow `skills/common/workspace-management.md` to set up local and remote workspaces, sync ModelOpt source, and download the model on the target machine. This avoids downloading twice and gives access to README, custom modeling code, and tokenizer config.
+**Download first.** Follow `.claude/skills/common/workspace-management.md` to set up local and remote workspaces, sync ModelOpt source, and download the model on the target machine. This avoids downloading twice and provides access to README, custom modeling code, and tokenizer config.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md at line 9, Update the
broken internal doc link in the sentence that currently references
"skills/common/workspace-management.md" so it uses the same
.claude/skills-relative path format used elsewhere in this PR; replace that
target with ".claude/skills/common/workspace-management.md" (the same canonical
path style as other references in this doc tree) to ensure the link resolves
reliably.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 235-243: Add explicit shape guards before the FP8 block dequant
reshape: verify that w.shape[-2] is divisible by s.shape[-2] and w.shape[-1] is
divisible by s.shape[-1] (the block_m/block_n computation for variables w, s,
block_m, block_n) and raise a clear ValueError (or AssertionError) that includes
the offending shapes if not divisible so we fail fast instead of performing an
invalid reshape; apply this check in the code path where scale.dim() == 3 before
computing block_m/block_n and reshaping/scaling param.data.

---

Duplicate comments:
In @.claude/skills/common/workspace-management.md:
- Line 45: Replace the awkward phrase "Task needs output from a previous step
(e.g., eval needs the PTQ checkpoint)" with the clearer wording "Task requires
output from a previous step (e.g., eval requires the PTQ checkpoint)" in the
workspace-management text; find the sentence that mentions "Task needs output"
and update both occurrences of "needs" to "requires" to ensure consistency and
clearer phrasing.

---

Nitpick comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 78-84: The snippet assumes model.safetensors.index.json always
exists; change it to first check for that file (use os.path.exists) and if
present load idx = json.load(... ) and use idx['weight_map'], otherwise open the
single-file checkpoint and extract parameter names (e.g., via
safetensors.safe_open or safetensors.torch.load_file to list keys) and then
apply the same regex normalization (re.sub(r'\.\d+\.', '.N.', k)) to produce
names; update variable usage (idx, names) so both branches yield the same set
for printing.
- Line 9: Update the broken internal doc link in the sentence that currently
references "skills/common/workspace-management.md" so it uses the same
.claude/skills-relative path format used elsewhere in this PR; replace that
target with ".claude/skills/common/workspace-management.md" (the same canonical
path style as other references in this doc tree) to ensure the link resolves
reliably.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9107923f-5183-44df-8d92-439acfcc8e31

📥 Commits

Reviewing files that changed from the base of the PR and between 87988e8 and fe8516c.

📒 Files selected for processing (5)
  • .claude/skills/common/environment-setup.md
  • .claude/skills/common/workspace-management.md
  • .claude/skills/ptq/SKILL.md
  • .claude/skills/ptq/references/unsupported-models.md
  • .claude/skills/ptq/tests.json
✅ Files skipped from review due to trivial changes (3)
  • .claude/skills/ptq/tests.json
  • .claude/skills/common/environment-setup.md
  • .claude/skills/ptq/SKILL.md

Signed-off-by: Meng Xin <mxin@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.claude/skills/common/remote_exec.sh (1)

105-127: ⚠️ Potential issue | 🔴 Critical

Replace remaining eval command construction (still injectable).

Line 155, Line 261, Line 348, Line 381, and Line 393 still execute interpolated command strings via eval. Because host/key/proxy/path values come from config or function args, this keeps a local command-injection path open.

🔧 Safer pattern (argv arrays, no eval)
-_ssh_base_opts() {
-    local opts="-o BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new"
-    local ctl_path
-    ctl_path="$(_ssh_control_path)"
-    opts+=" -o ControlPath='${ctl_path}'"
-    opts+=" -o ControlMaster=auto"
-    if [[ -n "${REMOTE_SSH_KEY:-}" ]]; then
-        opts+=" -i $REMOTE_SSH_KEY"
-    fi
-    if [[ -n "${REMOTE_SSH_PROXY:-}" ]]; then
-        opts+=" -o ProxyCommand='${REMOTE_SSH_PROXY}'"
-    fi
-    echo "$opts"
-}
-
-_ssh_base_cmd() {
-    echo "ssh $(_ssh_base_opts) ${REMOTE_USER}@${REMOTE_HOST}"
-}
+_ssh_base_argv() {
+    SSH_BASE_ARGV=(
+        -o BatchMode=yes
+        -o ConnectTimeout=15
+        -o StrictHostKeyChecking=accept-new
+        -o "ControlPath=$(_ssh_control_path)"
+        -o ControlMaster=auto
+    )
+    [[ -n "${REMOTE_SSH_KEY:-}" ]] && SSH_BASE_ARGV+=(-i "$REMOTE_SSH_KEY")
+    [[ -n "${REMOTE_SSH_PROXY:-}" ]] && SSH_BASE_ARGV+=(-o "ProxyCommand=${REMOTE_SSH_PROXY}")
+    SSH_BASE_ARGV+=("${REMOTE_USER}@${REMOTE_HOST}")
+}
@@
-    eval "ssh $opts -f -N ${REMOTE_USER}@${REMOTE_HOST}" 2>&1
+    ssh "${start_argv[@]}" -f -N 2>&1
@@
-    if result=$(eval "$(_ssh_base_cmd)" '"echo SSH_OK"' 2>&1); then
+    _ssh_base_argv
+    if result=$(ssh "${SSH_BASE_ARGV[@]}" "echo SSH_OK" 2>&1); then
@@
-        result=$(eval "$(_ssh_base_cmd)" "'echo $encoded | base64 -d | bash'" 2>&1) && rc=$? || rc=$?
+        _ssh_base_argv
+        result=$(ssh "${SSH_BASE_ARGV[@]}" "echo $encoded | base64 -d | bash" 2>&1) && rc=$? || rc=$?
@@
-    eval "$rsync_cmd"
+    rsync "${rsync_argv[@]}"
@@
-    eval "rsync -avz --progress -e \"ssh $(_ssh_base_opts)\" '${remote_src}/' '${local_path}/'"
+    rsync "${rsync_argv[@]}"
#!/bin/bash
# Verify no eval-based execution remains in this script.
rg -n '\beval\b|_ssh_base_cmd\b|rsync_cmd=' .claude/skills/common/remote_exec.sh

Expected: no eval usage for ssh/rsync execution paths.

Also applies to: 155-155, 261-261, 348-348, 371-381, 393-393

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/remote_exec.sh around lines 105 - 127, The script
still builds and executes interpolated command strings via eval (notably around
_ssh_base_cmd/_ssh_base_opts and places that set rsync_cmd), leaving
command-injection risk; fix by eliminating eval and switching to POSIX/Bash argv
arrays: change callers that currently do eval "$(_ssh_base_cmd) ...", instead
build an array like local -a ssh_cmd=(ssh -o BatchMode=yes -o ConnectTimeout=15
-o StrictHostKeyChecking=accept-new) then conditionally append args with quoted
variables (e.g. [[ -n $REMOTE_SSH_KEY ]] && ssh_cmd+=(-i "$REMOTE_SSH_KEY"); [[
-n $REMOTE_SSH_PROXY ]] && ssh_cmd+=(-o "ProxyCommand=$REMOTE_SSH_PROXY");
ssh_cmd+=("${REMOTE_USER}@${REMOTE_HOST}")), and execute with "${ssh_cmd[@]}"
(similarly convert any rsync_cmd string to an array and call "${rsync_cmd[@]}");
update or add an array-returning helper (or rename _ssh_base_opts to
produce/apply options as array pieces) and remove all eval usages so no
interpolated strings are executed.
🧹 Nitpick comments (1)
.claude/skills/ptq/references/unsupported-models.md (1)

52-53: Pin transformers install by commit SHA for reproducibility.

The current doc line installs from a moving main checkout, which makes troubleshooting non-reproducible. Recommend documenting install by commit SHA:

-    - **Found** → install from that clone: `pip install /tmp/transformers-main --quiet`, then re-run `AutoConfig.from_pretrained()`.
+    - **Found** → install from that clone at a pinned commit, then re-run `AutoConfig.from_pretrained()`:
+      `cd /tmp/transformers-main && git rev-parse HEAD`
+      `pip install /tmp/transformers-main --quiet`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md around lines 52 - 53,
The doc currently tells users to install from a moving main checkout ("pip
install /tmp/transformers-main --quiet") which harms reproducibility; update the
instruction to pin the installed transformers to a specific commit SHA by
checking out the repo to that SHA before installing, and update the surrounding
guidance (the sentence referencing AutoConfig.from_pretrained() and the
follow-up question about <ArchName>) to instruct using the pinned-commit install
for reproducible debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/common/remote_exec.sh:
- Around line 217-239: REMOTE_WORKSPACE may contain a literal "~/" which is not
expanded and breaks remote_run(), remote_sync_to(), rsync and mounts; add the
same tilde handling used for REMOTE_SSH_KEY to expand "~/" to "${HOME}/..."
right after REMOTE_WORKSPACE is assigned (or alternatively reject values
beginning with "~/" with an error), and ensure references in remote_run and
remote_sync_to continue to use the expanded REMOTE_WORKSPACE value; update
validation that currently checks REMOTE_WORKSPACE to run after the expansion so
the non-root and non-empty checks operate on the expanded path.

In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 14-15: Update the guidance around using --trust_remote_code
(referenced in the sentence about modeling_*.py and tokenization_*.py and the
tools hf_ptq.py, AutoConfig, AutoTokenizer, AutoModel and
trust_remote_code=True) to include a security warning: state that
--trust_remote_code executes arbitrary Python from the model repo, recommend
running it only in isolated or ephemeral environments (e.g., containers or VMs),
advise pinning to a specific trusted model revision/commit and verifying the
remote code before use, and suggest running with least privileges and network
restrictions where possible.

---

Duplicate comments:
In @.claude/skills/common/remote_exec.sh:
- Around line 105-127: The script still builds and executes interpolated command
strings via eval (notably around _ssh_base_cmd/_ssh_base_opts and places that
set rsync_cmd), leaving command-injection risk; fix by eliminating eval and
switching to POSIX/Bash argv arrays: change callers that currently do eval
"$(_ssh_base_cmd) ...", instead build an array like local -a ssh_cmd=(ssh -o
BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new) then
conditionally append args with quoted variables (e.g. [[ -n $REMOTE_SSH_KEY ]]
&& ssh_cmd+=(-i "$REMOTE_SSH_KEY"); [[ -n $REMOTE_SSH_PROXY ]] && ssh_cmd+=(-o
"ProxyCommand=$REMOTE_SSH_PROXY"); ssh_cmd+=("${REMOTE_USER}@${REMOTE_HOST}")),
and execute with "${ssh_cmd[@]}" (similarly convert any rsync_cmd string to an
array and call "${rsync_cmd[@]}"); update or add an array-returning helper (or
rename _ssh_base_opts to produce/apply options as array pieces) and remove all
eval usages so no interpolated strings are executed.

---

Nitpick comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 52-53: The doc currently tells users to install from a moving main
checkout ("pip install /tmp/transformers-main --quiet") which harms
reproducibility; update the instruction to pin the installed transformers to a
specific commit SHA by checking out the repo to that SHA before installing, and
update the surrounding guidance (the sentence referencing
AutoConfig.from_pretrained() and the follow-up question about <ArchName>) to
instruct using the pinned-commit install for reproducible debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 251ab2f2-b085-471c-93b7-5fd5c7045659

📥 Commits

Reviewing files that changed from the base of the PR and between fe8516c and d8f1764.

📒 Files selected for processing (7)
  • .claude/skills/common/remote-execution.md
  • .claude/skills/common/remote_exec.sh
  • .claude/skills/common/workspace-management.md
  • .claude/skills/ptq/SKILL.md
  • .claude/skills/ptq/references/launcher-guide.md
  • .claude/skills/ptq/references/slurm-setup-ptq.md
  • .claude/skills/ptq/references/unsupported-models.md
✅ Files skipped from review due to trivial changes (5)
  • .claude/skills/ptq/references/slurm-setup-ptq.md
  • .claude/skills/common/workspace-management.md
  • .claude/skills/ptq/SKILL.md
  • .claude/skills/common/remote-execution.md
  • .claude/skills/ptq/references/launcher-guide.md

Comment on lines +217 to +239
REMOTE_WORKSPACE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.workspace")"
REMOTE_GPU_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.gpu_type")"
REMOTE_CONTAINER_IMAGE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.container_image")"
REMOTE_ENV_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.env_type")"

# SLURM-specific
REMOTE_SLURM_ACCOUNT="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_account")"
REMOTE_SLURM_PARTITION="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_partition")"

# Expand ~ in ssh_key
if [[ "${REMOTE_SSH_KEY:-}" == "~/"* ]]; then
REMOTE_SSH_KEY="${HOME}/${REMOTE_SSH_KEY#\~/}"
fi

# Validate required fields
if [[ -z "$REMOTE_HOST" ]]; then
echo "ERROR: Cluster '$cluster_name' has no login_node defined." >&2
return 1
fi
if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then
echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2
return 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find workspace values in example/docs that may still use '~'
rg -n 'workspace:\s*~/?' .claude README.md docs 2>/dev/null

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# First, get file size and structure
wc -l .claude/skills/common/remote_exec.sh

Repository: NVIDIA/Model-Optimizer

Length of output: 106


🏁 Script executed:

# Read the section in question (lines 217-239) and surrounding context
sed -n '215,245p' .claude/skills/common/remote_exec.sh | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 1816


🏁 Script executed:

# Read lines around 341-343 to see how REMOTE_WORKSPACE is used
sed -n '335,350p' .claude/skills/common/remote_exec.sh | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 881


🏁 Script executed:

# Read lines around 369-370 to see how REMOTE_WORKSPACE is used
sed -n '360,380p' .claude/skills/common/remote_exec.sh | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 932


Add validation to reject ~ in workspace values, or expand them like ssh_key.

REMOTE_WORKSPACE parsed from config can contain ~/..., but unlike REMOTE_SSH_KEY (lines 227–228), it has no tilde expansion. When used in remote_run() (line 348, double-quoted) and remote_sync_to() (line 379, then wrapped in single quotes at line 388), the tilde is passed literally and won't expand remotely, causing cd, rsync, and mounts to fail.

Either expand ~ in workspace values to $HOME form (matching SSH_KEY pattern), or reject them with validation:

Suggested validation
     if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then
         echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2
         return 1
     fi
+    if [[ "$REMOTE_WORKSPACE" == "~" || "$REMOTE_WORKSPACE" == "~/"* ]]; then
+        echo "ERROR: Cluster '$cluster_name' workspace must not use '~'. Use an absolute path or \$HOME/... form." >&2
+        return 1
+    fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
REMOTE_WORKSPACE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.workspace")"
REMOTE_GPU_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.gpu_type")"
REMOTE_CONTAINER_IMAGE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.container_image")"
REMOTE_ENV_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.env_type")"
# SLURM-specific
REMOTE_SLURM_ACCOUNT="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_account")"
REMOTE_SLURM_PARTITION="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_partition")"
# Expand ~ in ssh_key
if [[ "${REMOTE_SSH_KEY:-}" == "~/"* ]]; then
REMOTE_SSH_KEY="${HOME}/${REMOTE_SSH_KEY#\~/}"
fi
# Validate required fields
if [[ -z "$REMOTE_HOST" ]]; then
echo "ERROR: Cluster '$cluster_name' has no login_node defined." >&2
return 1
fi
if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then
echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2
return 1
fi
REMOTE_WORKSPACE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.workspace")"
REMOTE_GPU_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.gpu_type")"
REMOTE_CONTAINER_IMAGE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.container_image")"
REMOTE_ENV_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.env_type")"
# SLURM-specific
REMOTE_SLURM_ACCOUNT="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_account")"
REMOTE_SLURM_PARTITION="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_partition")"
# Expand ~ in ssh_key
if [[ "${REMOTE_SSH_KEY:-}" == "~/"* ]]; then
REMOTE_SSH_KEY="${HOME}/${REMOTE_SSH_KEY#\~/}"
fi
# Validate required fields
if [[ -z "$REMOTE_HOST" ]]; then
echo "ERROR: Cluster '$cluster_name' has no login_node defined." >&2
return 1
fi
if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then
echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2
return 1
fi
if [[ "$REMOTE_WORKSPACE" == "~" || "$REMOTE_WORKSPACE" == "~/"* ]]; then
echo "ERROR: Cluster '$cluster_name' workspace must not use '~'. Use an absolute path or \$HOME/... form." >&2
return 1
fi
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 227-227: Tilde does not expand in quotes. Use $HOME.

(SC2088)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/remote_exec.sh around lines 217 - 239,
REMOTE_WORKSPACE may contain a literal "~/" which is not expanded and breaks
remote_run(), remote_sync_to(), rsync and mounts; add the same tilde handling
used for REMOTE_SSH_KEY to expand "~/" to "${HOME}/..." right after
REMOTE_WORKSPACE is assigned (or alternatively reject values beginning with "~/"
with an error), and ensure references in remote_run and remote_sync_to continue
to use the expanded REMOTE_WORKSPACE value; update validation that currently
checks REMOTE_WORKSPACE to run after the expansion so the non-root and non-empty
checks operate on the expanded path.

Comment on lines +14 to +15
2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .claude/skills/ptq/references/unsupported-models.md | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 2380


🏁 Script executed:

wc -l .claude/skills/ptq/references/unsupported-models.md

Repository: NVIDIA/Model-Optimizer

Length of output: 121


🏁 Script executed:

grep -n -i "security\|isolation\|revision\|trust_remote_code\|sandbox" .claude/skills/ptq/references/unsupported-models.md

Repository: NVIDIA/Model-Optimizer

Length of output: 662


🏁 Script executed:

cat -n .claude/skills/ptq/references/unsupported-models.md | sed -n '1,30p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1946


🏁 Script executed:

# Check the workspace-management.md reference mentioned in line 9 to see if security guidance is there
cat -n .claude/skills/common/workspace-management.md 2>/dev/null | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 1515


🏁 Script executed:

# Search for security-related content in the broader context
grep -r -n -i "trust_remote_code\|sandbox\|isolation" .claude/skills/ 2>/dev/null | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 1431


Add security guidance for --trust_remote_code usage.

Line 14 mandates --trust_remote_code but omits any warning that this executes arbitrary Python from model repositories. Add guidance to run only in isolated environments and pin a trusted model revision/commit.

Suggested patch
2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.
+   **Security note:** `trust_remote_code` executes model-repo Python. Use only in isolated environments and pin a trusted `revision`/commit.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.
2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.
**Security note:** `trust_remote_code` executes model-repo Python. Use only in isolated environments and pin a trusted `revision`/commit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md around lines 14 - 15,
Update the guidance around using --trust_remote_code (referenced in the sentence
about modeling_*.py and tokenization_*.py and the tools hf_ptq.py, AutoConfig,
AutoTokenizer, AutoModel and trust_remote_code=True) to include a security
warning: state that --trust_remote_code executes arbitrary Python from the model
repo, recommend running it only in isolated or ephemeral environments (e.g.,
containers or VMs), advise pinning to a specific trusted model revision/commit
and verifying the remote code before use, and suggest running with least
privileges and network restrictions where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants