-
Notifications
You must be signed in to change notification settings - Fork 333
GPTQ Lite implementation #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c66842b
5a0b9a7
93850f4
7e0c38f
0a7d1b6
90b845a
d407e9b
078af70
bbb2cad
58fa8a4
8f7e47d
3193e7f
bf7ae10
87fdc5e
9ebe0b3
9d7a29e
37e2b03
1f6ce03
0627fb3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -234,6 +234,7 @@ | |
| "algorithm": "max", | ||
| } | ||
|
|
||
|
|
||
| INT4_AWQ_CFG = { | ||
| "quant_cfg": { | ||
| "*weight_quantizer": { | ||
|
|
@@ -1189,6 +1190,44 @@ class SVDQuantConfig(QuantizeAlgorithmConfig): | |
| ) | ||
|
|
||
|
|
||
| class GPTQLiteConfig(QuantizeAlgorithmConfig): | ||
| """The config for GPTQ lite. | ||
|
|
||
| GPTQ lite is a variant of GPTQ that does not exactly follow the official GPTQ implementation. | ||
|
|
||
| GPTQ lite does not perform sequential quantization of layers. This means that the updated | ||
| activations are not used to process the next layer. | ||
|
|
||
| The default values are taken from the official GPTQ implementation: | ||
| https://github.com/IST-DASLab/FP-Quant/blob/d2e3092f968262c4de5fb050e1aef568a280dadd/src/quantization/gptq.py#L35 | ||
|
|
||
| Note: This feature is currently experimental and may not translate to improved accuracy as expected. | ||
|
|
||
|
|
||
| """ | ||
|
|
||
| method: Literal["gptq_lite"] = ModeloptField("gptq_lite") | ||
| percdamp: float | None = ModeloptField( | ||
| default=0.01, | ||
| gt=0.0, | ||
| le=1.0, | ||
| title="Percentage damping factor.", | ||
| description="The percentage of average Hessian diagonal used for damping.", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you have a reference from the original paper about what these are, could you also share the link too?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also add some instructions here, so users can know what's the impact of increasing/decreasing this parameter? |
||
| ) | ||
| block_size: int | None = ModeloptField( | ||
| default=128, | ||
| title="Block size for GPTQ weight update.", | ||
| description="""The block size for GPTQ weight update, which must be a multiple of the | ||
| group_size used in the quantization.""", | ||
| ) | ||
|
Comment on lines
+1217
to
+1222
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be the multiple of block_size used in quantization. We should explain it in the description as well. |
||
| hessian_state_path: str | None = ModeloptField( | ||
| default=None, | ||
| title="Path to the Hessian state file.", | ||
| description="""The path to the Hessian state file. If hessian path exists, we load from | ||
| hessian file instead of recomputing them.""", | ||
| ) | ||
sugunav14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| QuantizeQuantCfgType = dict[ | ||
| str | Callable, | ||
| QuantizerAttributeConfig | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you estimate how much effort is needed if we need to add this constraint? I am thinking if we can have a quick test to see what's the accuracy impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be addressed in a followup PR