docs: update LLM tutorial to optimize and evaluate large language models#126
Conversation
sharpenb
left a comment
There was a problem hiding this comment.
Thanks for the tutorial! I have mostly similar comments as in the image generation PR #127 (e.g. tutorial card, section org....). Beyond these other comments, here some other:
- We could add some other efficiency emtrics inlcuding memeory, latency (like in the AI courses)
- We should explain why we use a small llm and explain that it works with others.
bb70acf to
d5a1d9f
Compare
sharpenb
left a comment
There was a problem hiding this comment.
Thanks for the update :) I have mostly some shared comments with the image generatino PR. Some further comments:
- Could we add the minimal config for the device also for the iamge generation tutorial?
- Could we add memory and througput metrics? These are important when quantizing LLMs.
- It is not clear that we need
inference_argsto be udpated
|
@gtregoat @sharpenb @SaboniAmine could you review this PR, I think it would be nice to merge before the release. |
sharpenb
left a comment
There was a problem hiding this comment.
Thanks for the update! A last update would be to indicate the units with througput (samples/s), latency (s/samples) and Base Model, Compressed Model, Relative Difference in the row/columns names to be complete. Could actually also add CO2 emissions which is in the evaluation agent. COuld we also have rather token/second, second/token, and ideallytime to first token? :)
Hi @sharpenb, I updated the metrics so that I don't require me to boot up a VM and rerun the notebook another time. Hope that is enough for now, for the ASR or video tutorial we can add some additional merics. |
sharpenb
left a comment
There was a problem hiding this comment.
Thanks for the update! The tutorial is very nice. I would be super nice if we can add the time-to-first-token adn token time instead of total time but this should be good for now.
nifleisch
left a comment
There was a problem hiding this comment.
Cool tutorial with impressive results! I found a few minor errors that you might want to fix before merging:
- "Additionally, we lower the n_iterations and n_warmup_iterations to ensure that we monitor the performance of the model whenever it is running smoothly." I think while this explains well why we would like to use warmup steps in general, it does not justify why we reduce n_warmup_iterations here.
- In the text you mention that we limit the dataset like this: datamodule.limit_datasets(10), but in the code block below we do datamodule.limit_datasets(100)
- Similarly, in the text it is mentioned that we use hqq_weight_bits=4 while we are using hqq_weight_bits=8 in the code
asked Sara to do the final approval as you are not logged in
- Renamed the tutorial from "Making your LLMs 4x smaller" to "Optimize and Evaluate Large Language Models" for clarity. - Enhanced the tutorial with a comprehensive workflow for optimizing and evaluating large language models using the `hqq` quantization and `torch_compile` compilation. - Updated the tutorial index to include the new section on optimizing large language models. - Improved code snippets and markdown explanations throughout the tutorial for better user guidance and understanding.
…luation metrics * Changed cell types and execution counts for clarity and consistency. * Added new code cells for calculating and displaying percentage differences between original and optimized models. * Updated markdown sections to reflect the results of the evaluation and the performance comparison of models.
…tments * Added a Colab badge for easy access to the tutorial. * Updated execution count to null for clarity in code cells. * Removed unnecessary output warnings from the code cell for a cleaner presentation.
… cells * Eliminated unnecessary markdown and code cells to enhance clarity and focus in the tutorial. * Consolidated library imports for improved organization and readability.
* Updated the evaluation section to focus on essential metrics, specifically `elapsed_time` and `perplexity`, for a more streamlined explanation. * Removed references to additional metrics to enhance clarity and focus on key performance indicators.
* Added a new section emphasizing that smaller models are suitable for demonstrating the optimization process. * Included a line break for improved readability in the introductory explanation about loading the model and tokenizer.
* Reformatted the tutorial's component details into a markdown table for better readability. * Adjusted execution counts in code cells for consistency. * Updated output messages to provide clearer feedback during execution. * Revised the evaluation section to streamline the explanation of metrics used.
…ed evaluation metrics * Adjusted execution counts in code cells for consistency. * Removed unnecessary output warnings to streamline the tutorial. * Enhanced the evaluation section to clarify the metrics used, focusing on `elapsed_time` and `perplexity`. * Updated the tutorial to reflect changes in the `SmashConfig` object and model evaluation process.
…djustments * Revised evaluation metrics to include latency and energy consumption alongside perplexity for a comprehensive performance assessment. * Updated code cells to reflect changes in execution counts and outputs, ensuring clarity and consistency. * Improved markdown explanations to better guide users through the optimization and evaluation processes. * Adjusted model loading and inference steps to utilize the `pipeline` function for streamlined usage.
…mprovements * Revised execution counts in code cells for consistency and clarity. * Updated model descriptions and markdown explanations to improve user understanding. * Enhanced the evaluation section with a markdown table summarizing performance metrics, including throughput and energy consumption. * Adjusted inference arguments and model handling to ensure correct evaluation processes. * Improved overall structure and readability of the tutorial.
…ing execution count * Set execution count to null for clarity in code cells. * Removed extensive output warnings to streamline the tutorial presentation. * Enhanced overall readability by focusing on essential content.
* Removed the mention of the `transformers` and `datasets` libraries from the tutorial to streamline content. * Maintained focus on essential details regarding model optimization and evaluation processes.
…ements * Updated section headings to be numbered for better organization and flow. * Removed redundant markdown and code snippets to streamline content. * Enhanced clarity in explanations regarding model loading, configuration, and evaluation processes. * Adjusted markdown formatting for improved readability and user guidance.
* Revised markdown table headers to reflect "Base Model" and "Compressed Model" for improved clarity. * Enhanced metric descriptions by including units for energy consumption, throughput, and total time. * Adjusted code to ensure consistent formatting and readability in the evaluation section.
* Adjusted code snippets for consistent use of double quotes in metric retrieval. * Enhanced readability by adding spacing in import statements and removing unnecessary break statements. * Maintained focus on clarity and structure within the evaluation section.
* Changed `hqq_weight_bits` in `SmashConfig` from 8 to 4 for improved model optimization. * Increased dataset limit in `EvaluationAgent` from 10 to 100 to enhance evaluation comprehensiveness. * Ensured clarity in the evaluation process description while maintaining focus on performance monitoring.
f681344 to
f8e0d35
Compare
SmashConfigobject.Description
Related Issue
Fixes #121
Type of Change
How Has This Been Tested?
Checklist
Additional Notes