feat: add target modules to quantizers#452
Conversation
begumcig
left a comment
There was a problem hiding this comment.
Already looks 99% ready to go to me! Added some nitpicky comments 🙈🙈. Amazing job Gaspar 🧡🧡🧡🧡
| pruna_logger.warning( | ||
| "You are using torchao with torch.compile. " | ||
| "Please set `smash_config['torch_compile_mode']='max-autotune-no-cudagraphs'` for best results; " | ||
| "otherwise you may encounter undesirable outcomes." |
There was a problem hiding this comment.
what's the undesirable outcomes here
There was a problem hiding this comment.
I don't know, the content of this function was already in the file I just isolated the warnings from the apply logic. Maybe @johannaSommer knows more about this undesirable outcome?
| # save the rest of the model, if it is a janus like model, | ||
| # and add a config file to the quantized model path. | ||
| if hasattr(model, "model") and hasattr(model.model, "language_model"): | ||
| model.model.save_pretrained = original_save_pretrained |
There was a problem hiding this comment.
shall we wrap this in a try finally block so that if even saving fails we revert the saving back to original_save_pretrained? It might be confusing for users if they try to save the model again.
There was a problem hiding this comment.
That's a super good point. I think a context is a clearer and cleaner way to factorize this code out of here for clarity in this already complicated saving function
llcnt
left a comment
There was a problem hiding this comment.
Thanks for the PR and big congrats for the amazing job on hqq saving/loading ;)
The hqq with target modules on a LLMpipeline is not working currently, I will let you fix this before approving the PR!
begumcig
left a comment
There was a problem hiding this comment.
Amazing Gaspar just left one comment about the device attribute, maybe we should talk about it, wdyt? Feel free to ignore it If I am making no sense! Everything looks great overall! Thank you!!
| if safe_is_instance(model, Pipeline): | ||
| move_to_device(model.model, device, raise_error, device_map) | ||
| if device != "accelerate": | ||
| model.device = torch.device(device) |
There was a problem hiding this comment.
Could this possibly cause problems if the model doesn't have a device attr? or it's a property that cannot be set directly?
There was a problem hiding this comment.
In this if block we can assume model is a transformers Pipeline, which relies on getting and setting its device attribute (I checked transformers 4.42, 4.57 and 5.0.0rc1 and they all set self.device = ... in their init. If you want I can add a try..except but it may be better to explicitely fail if there is unknown behavior so we can handle it. WDYT?
There was a problem hiding this comment.
Sounds great to me! Thank you so much for checking <3
* feat: add target modules to torchao * feat: add target modules to awq * feat: add target modules to hqq * feat: add and use target backbone for default target modules * feat: extend hqq-diffusers save and load to handle target module * feat: add monkey patching context
* feat: add target modules to torchao * feat: add target modules to awq * feat: add target modules to hqq * feat: add and use target backbone for default target modules * feat: extend hqq-diffusers save and load to handle target module * feat: add monkey patching context
Description
This PR adds target modules capabilities to torchao, awq, hqq and hqq-diffusers.
All apply methods were updated with the map_targeted_nn_roots structure with custom quantization function, similarly to what was done in llm-int8, diffusers-int8 and quanto.
A utility function
target_backbonewas added to provide a global default for target modules: automatically target the transformer, unet or model language.The save and load functions for hqq were updated to maintain compatibility with this new feature and improve the pipeline saving and loading.
(documentation update because it adds new hyperparameter to the quantizers)
Related Issue
Fixes #386
Type of Change
How Has This Been Tested?
Ran each algorithm in a notebook to make sure the model can run after quantization, some Linears are quantized and some excluded linears are not.
Some tests don't pass, but they already didn't before this PR.
Checklist
Additional Notes