Skip to content

docs: update LLM tutorial to optimize and evaluate large language models#126

Merged
davidberenstein1957 merged 16 commits intomainfrom
docs/121-doc-update-end2end-llm-tutorial
Jun 24, 2025
Merged

docs: update LLM tutorial to optimize and evaluate large language models#126
davidberenstein1957 merged 16 commits intomainfrom
docs/121-doc-update-end2end-llm-tutorial

Conversation

@davidberenstein1957
Copy link
Copy Markdown
Member

  • Added a new section for optimizing and evaluating large language models using the SmashConfig object.
  • Updated the tutorial to include installation instructions, model loading, and inference steps.
  • Introduced evaluation metrics such as elapsed time and perplexity for assessing model performance.
  • Removed outdated references and improved clarity throughout the tutorial.

Description

Related Issue

Fixes #121

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

@davidberenstein1957 davidberenstein1957 linked an issue May 16, 2025 that may be closed by this pull request
Comment thread docs/tutorials/index.rst Outdated
Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tutorial! I have mostly similar comments as in the image generation PR #127 (e.g. tutorial card, section org....). Beyond these other comments, here some other:

  • We could add some other efficiency emtrics inlcuding memeory, latency (like in the AI courses)
  • We should explain why we use a small llm and explain that it works with others.

@davidberenstein1957 davidberenstein1957 force-pushed the docs/121-doc-update-end2end-llm-tutorial branch 2 times, most recently from bb70acf to d5a1d9f Compare June 5, 2025 11:51
Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update :) I have mostly some shared comments with the image generatino PR. Some further comments:

  • Could we add the minimal config for the device also for the iamge generation tutorial?
  • Could we add memory and througput metrics? These are important when quantizing LLMs.
  • It is not clear that we need inference_args to be udpated

@davidberenstein1957 davidberenstein1957 requested review from SaboniAmine, gtregoat and sharpenb and removed request for begumcig and nifleisch June 6, 2025 15:56
@davidberenstein1957
Copy link
Copy Markdown
Member Author

@gtregoat @sharpenb @SaboniAmine could you review this PR, I think it would be nice to merge before the release.

Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! A last update would be to indicate the units with througput (samples/s), latency (s/samples) and Base Model, Compressed Model, Relative Difference in the row/columns names to be complete. Could actually also add CO2 emissions which is in the evaluation agent. COuld we also have rather token/second, second/token, and ideallytime to first token? :)

cursor[bot]

This comment was marked as outdated.

@davidberenstein1957
Copy link
Copy Markdown
Member Author

davidberenstein1957 commented Jun 17, 2025

Thanks for the update! A last update would be to indicate the units with througput (samples/s), latency (s/samples) and Base Model, Compressed Model, Relative Difference in the row/columns names to be complete. Could actually also add CO2 emissions which is in the evaluation agent. COuld we also have rather token/second, second/token, and ideallytime to first token? :)

Hi @sharpenb, I updated the metrics so that I don't require me to boot up a VM and rerun the notebook another time. Hope that is enough for now, for the ASR or video tutorial we can add some additional merics.

Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! The tutorial is very nice. I would be super nice if we can add the time-to-first-token adn token time instead of total time but this should be good for now.

@davidberenstein1957 davidberenstein1957 requested review from nifleisch and sdiazlor and removed request for SaboniAmine and gtregoat June 23, 2025 11:06
Copy link
Copy Markdown
Collaborator

@nifleisch nifleisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool tutorial with impressive results! I found a few minor errors that you might want to fix before merging:

  • "Additionally, we lower the n_iterations and n_warmup_iterations to ensure that we monitor the performance of the model whenever it is running smoothly." I think while this explains well why we would like to use warmup steps in general, it does not justify why we reduce n_warmup_iterations here.
  • In the text you mention that we limit the dataset like this: datamodule.limit_datasets(10), but in the code block below we do datamodule.limit_datasets(100)
  • Similarly, in the text it is mentioned that we use hqq_weight_bits=4 while we are using hqq_weight_bits=8 in the code

@sharpenb sharpenb self-requested a review June 24, 2025 12:04
Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do a last fix ;)

Comment thread docs/tutorials/llms.ipynb Outdated
@davidberenstein1957 davidberenstein1957 dismissed sharpenb’s stale review June 24, 2025 12:58

asked Sara to do the final approval as you are not logged in

- Renamed the tutorial from "Making your LLMs 4x smaller" to "Optimize and Evaluate Large Language Models" for clarity.
- Enhanced the tutorial with a comprehensive workflow for optimizing and evaluating large language models using the `hqq` quantization and `torch_compile` compilation.
- Updated the tutorial index to include the new section on optimizing large language models.
- Improved code snippets and markdown explanations throughout the tutorial for better user guidance and understanding.
…luation metrics

* Changed cell types and execution counts for clarity and consistency.
* Added new code cells for calculating and displaying percentage differences between original and optimized models.
* Updated markdown sections to reflect the results of the evaluation and the performance comparison of models.
…tments

* Added a Colab badge for easy access to the tutorial.
* Updated execution count to null for clarity in code cells.
* Removed unnecessary output warnings from the code cell for a cleaner presentation.
… cells

* Eliminated unnecessary markdown and code cells to enhance clarity and focus in the tutorial.
* Consolidated library imports for improved organization and readability.
* Updated the evaluation section to focus on essential metrics, specifically `elapsed_time` and `perplexity`, for a more streamlined explanation.
* Removed references to additional metrics to enhance clarity and focus on key performance indicators.
* Added a new section emphasizing that smaller models are suitable for demonstrating the optimization process.
* Included a line break for improved readability in the introductory explanation about loading the model and tokenizer.
* Reformatted the tutorial's component details into a markdown table for better readability.
* Adjusted execution counts in code cells for consistency.
* Updated output messages to provide clearer feedback during execution.
* Revised the evaluation section to streamline the explanation of metrics used.
…ed evaluation metrics

* Adjusted execution counts in code cells for consistency.
* Removed unnecessary output warnings to streamline the tutorial.
* Enhanced the evaluation section to clarify the metrics used, focusing on `elapsed_time` and `perplexity`.
* Updated the tutorial to reflect changes in the `SmashConfig` object and model evaluation process.
…djustments

* Revised evaluation metrics to include latency and energy consumption alongside perplexity for a comprehensive performance assessment.
* Updated code cells to reflect changes in execution counts and outputs, ensuring clarity and consistency.
* Improved markdown explanations to better guide users through the optimization and evaluation processes.
* Adjusted model loading and inference steps to utilize the `pipeline` function for streamlined usage.
…mprovements

* Revised execution counts in code cells for consistency and clarity.
* Updated model descriptions and markdown explanations to improve user understanding.
* Enhanced the evaluation section with a markdown table summarizing performance metrics, including throughput and energy consumption.
* Adjusted inference arguments and model handling to ensure correct evaluation processes.
* Improved overall structure and readability of the tutorial.
…ing execution count

* Set execution count to null for clarity in code cells.
* Removed extensive output warnings to streamline the tutorial presentation.
* Enhanced overall readability by focusing on essential content.
* Removed the mention of the `transformers` and `datasets` libraries from the tutorial to streamline content.
* Maintained focus on essential details regarding model optimization and evaluation processes.
…ements

* Updated section headings to be numbered for better organization and flow.
* Removed redundant markdown and code snippets to streamline content.
* Enhanced clarity in explanations regarding model loading, configuration, and evaluation processes.
* Adjusted markdown formatting for improved readability and user guidance.
* Revised markdown table headers to reflect "Base Model" and "Compressed Model" for improved clarity.
* Enhanced metric descriptions by including units for energy consumption, throughput, and total time.
* Adjusted code to ensure consistent formatting and readability in the evaluation section.
* Adjusted code snippets for consistent use of double quotes in metric retrieval.
* Enhanced readability by adding spacing in import statements and removing unnecessary break statements.
* Maintained focus on clarity and structure within the evaluation section.
* Changed `hqq_weight_bits` in `SmashConfig` from 8 to 4 for improved model optimization.
* Increased dataset limit in `EvaluationAgent` from 10 to 100 to enhance evaluation comprehensiveness.
* Ensured clarity in the evaluation process description while maintaining focus on performance monitoring.
@davidberenstein1957 davidberenstein1957 force-pushed the docs/121-doc-update-end2end-llm-tutorial branch from f681344 to f8e0d35 Compare June 24, 2025 13:06
@davidberenstein1957 davidberenstein1957 merged commit 458e009 into main Jun 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] update end2end llm tutorial

4 participants