docs: update LLM tutorial to optimize and evaluate large language models by davidberenstein1957 · Pull Request #126 · PrunaAI/pruna

davidberenstein1957 · 2025-05-16T02:56:53Z

Added a new section for optimizing and evaluating large language models using the SmashConfig object.
Updated the tutorial to include installation instructions, model loading, and inference steps.
Introduced evaluation metrics such as elapsed time and perplexity for assessing model performance.
Removed outdated references and improved clarity throughout the tutorial.

Description

Related Issue

Fixes #121

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

sharpenb

Thanks for the tutorial! I have mostly similar comments as in the image generation PR #127 (e.g. tutorial card, section org....). Beyond these other comments, here some other:

We could add some other efficiency emtrics inlcuding memeory, latency (like in the AI courses)
We should explain why we use a small llm and explain that it works with others.

sharpenb

Thanks for the update :) I have mostly some shared comments with the image generatino PR. Some further comments:

Could we add the minimal config for the device also for the iamge generation tutorial?
Could we add memory and througput metrics? These are important when quantizing LLMs.
It is not clear that we need inference_args to be udpated

davidberenstein1957 · 2025-06-10T16:42:02Z

@gtregoat @sharpenb @SaboniAmine could you review this PR, I think it would be nice to merge before the release.

sharpenb

Thanks for the update! A last update would be to indicate the units with througput (samples/s), latency (s/samples) and Base Model, Compressed Model, Relative Difference in the row/columns names to be complete. Could actually also add CO2 emissions which is in the evaluation agent. COuld we also have rather token/second, second/token, and ideallytime to first token? :)

davidberenstein1957 · 2025-06-17T12:45:44Z

Thanks for the update! A last update would be to indicate the units with througput (samples/s), latency (s/samples) and Base Model, Compressed Model, Relative Difference in the row/columns names to be complete. Could actually also add CO2 emissions which is in the evaluation agent. COuld we also have rather token/second, second/token, and ideallytime to first token? :)

Hi @sharpenb, I updated the metrics so that I don't require me to boot up a VM and rerun the notebook another time. Hope that is enough for now, for the ASR or video tutorial we can add some additional merics.

sharpenb

Thanks for the update! The tutorial is very nice. I would be super nice if we can add the time-to-first-token adn token time instead of total time but this should be good for now.

nifleisch

Cool tutorial with impressive results! I found a few minor errors that you might want to fix before merging:

"Additionally, we lower the n_iterations and n_warmup_iterations to ensure that we monitor the performance of the model whenever it is running smoothly." I think while this explains well why we would like to use warmup steps in general, it does not justify why we reduce n_warmup_iterations here.
In the text you mention that we limit the dataset like this: datamodule.limit_datasets(10), but in the code block below we do datamodule.limit_datasets(100)
Similarly, in the text it is mentioned that we use hqq_weight_bits=4 while we are using hqq_weight_bits=8 in the code

sharpenb

We need to do a last fix ;)

asked Sara to do the final approval as you are not logged in

- Renamed the tutorial from "Making your LLMs 4x smaller" to "Optimize and Evaluate Large Language Models" for clarity. - Enhanced the tutorial with a comprehensive workflow for optimizing and evaluating large language models using the `hqq` quantization and `torch_compile` compilation. - Updated the tutorial index to include the new section on optimizing large language models. - Improved code snippets and markdown explanations throughout the tutorial for better user guidance and understanding.

…luation metrics * Changed cell types and execution counts for clarity and consistency. * Added new code cells for calculating and displaying percentage differences between original and optimized models. * Updated markdown sections to reflect the results of the evaluation and the performance comparison of models.

…tments * Added a Colab badge for easy access to the tutorial. * Updated execution count to null for clarity in code cells. * Removed unnecessary output warnings from the code cell for a cleaner presentation.

… cells * Eliminated unnecessary markdown and code cells to enhance clarity and focus in the tutorial. * Consolidated library imports for improved organization and readability.

* Updated the evaluation section to focus on essential metrics, specifically `elapsed_time` and `perplexity`, for a more streamlined explanation. * Removed references to additional metrics to enhance clarity and focus on key performance indicators.

* Added a new section emphasizing that smaller models are suitable for demonstrating the optimization process. * Included a line break for improved readability in the introductory explanation about loading the model and tokenizer.

* Reformatted the tutorial's component details into a markdown table for better readability. * Adjusted execution counts in code cells for consistency. * Updated output messages to provide clearer feedback during execution. * Revised the evaluation section to streamline the explanation of metrics used.

…ed evaluation metrics * Adjusted execution counts in code cells for consistency. * Removed unnecessary output warnings to streamline the tutorial. * Enhanced the evaluation section to clarify the metrics used, focusing on `elapsed_time` and `perplexity`. * Updated the tutorial to reflect changes in the `SmashConfig` object and model evaluation process.

…djustments * Revised evaluation metrics to include latency and energy consumption alongside perplexity for a comprehensive performance assessment. * Updated code cells to reflect changes in execution counts and outputs, ensuring clarity and consistency. * Improved markdown explanations to better guide users through the optimization and evaluation processes. * Adjusted model loading and inference steps to utilize the `pipeline` function for streamlined usage.

…mprovements * Revised execution counts in code cells for consistency and clarity. * Updated model descriptions and markdown explanations to improve user understanding. * Enhanced the evaluation section with a markdown table summarizing performance metrics, including throughput and energy consumption. * Adjusted inference arguments and model handling to ensure correct evaluation processes. * Improved overall structure and readability of the tutorial.

…ing execution count * Set execution count to null for clarity in code cells. * Removed extensive output warnings to streamline the tutorial presentation. * Enhanced overall readability by focusing on essential content.

* Removed the mention of the `transformers` and `datasets` libraries from the tutorial to streamline content. * Maintained focus on essential details regarding model optimization and evaluation processes.

…ements * Updated section headings to be numbered for better organization and flow. * Removed redundant markdown and code snippets to streamline content. * Enhanced clarity in explanations regarding model loading, configuration, and evaluation processes. * Adjusted markdown formatting for improved readability and user guidance.

* Revised markdown table headers to reflect "Base Model" and "Compressed Model" for improved clarity. * Enhanced metric descriptions by including units for energy consumption, throughput, and total time. * Adjusted code to ensure consistent formatting and readability in the evaluation section.

* Adjusted code snippets for consistent use of double quotes in metric retrieval. * Enhanced readability by adding spacing in import statements and removing unnecessary break statements. * Maintained focus on clarity and structure within the evaluation section.

* Changed `hqq_weight_bits` in `SmashConfig` from 8 to 4 for improved model optimization. * Increased dataset limit in `EvaluationAgent` from 10 to 100 to enhance evaluation comprehensiveness. * Ensured clarity in the evaluation process description while maintaining focus on performance monitoring.

davidberenstein1957 linked an issue May 16, 2025 that may be closed by this pull request

[DOC] update end2end llm tutorial #121

Closed

davidberenstein1957 requested review from begumcig, nifleisch and sharpenb May 16, 2025 06:22

davidberenstein1957 mentioned this pull request May 26, 2025

docs: add diffusers tutorial draft #151

Closed

10 tasks

sharpenb reviewed Jun 2, 2025

View reviewed changes

Comment thread docs/tutorials/index.rst Outdated

sharpenb requested changes Jun 2, 2025

View reviewed changes

davidberenstein1957 force-pushed the docs/121-doc-update-end2end-llm-tutorial branch 2 times, most recently from bb70acf to d5a1d9f Compare June 5, 2025 11:51

davidberenstein1957 requested a review from sharpenb June 6, 2025 06:04

sharpenb requested changes Jun 6, 2025

View reviewed changes

davidberenstein1957 requested review from SaboniAmine, gtregoat and sharpenb and removed request for begumcig and nifleisch June 6, 2025 15:56

sharpenb requested changes Jun 15, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

davidberenstein1957 requested a review from sharpenb June 17, 2025 14:46

sharpenb approved these changes Jun 20, 2025

View reviewed changes

davidberenstein1957 requested review from nifleisch and sdiazlor and removed request for SaboniAmine and gtregoat June 23, 2025 11:06

nifleisch approved these changes Jun 24, 2025

View reviewed changes

sharpenb self-requested a review June 24, 2025 12:04

sharpenb previously requested changes Jun 24, 2025

View reviewed changes

Comment thread docs/tutorials/llms.ipynb Outdated

davidberenstein1957 requested a review from sharpenb June 24, 2025 12:52

davidberenstein1957 added 16 commits June 24, 2025 15:05

docs: enhance LLM tutorial with Colab badge and execution count adjus…

fdfefb4

…tments * Added a Colab badge for easy access to the tutorial. * Updated execution count to null for clarity in code cells. * Removed unnecessary output warnings from the code cell for a cleaner presentation.

docs: streamline LLM tutorial by removing redundant markdown and code…

c041dca

… cells * Eliminated unnecessary markdown and code cells to enhance clarity and focus in the tutorial. * Consolidated library imports for improved organization and readability.

docs: update LLM tutorial by removing outdated library references

f842cb1

* Removed the mention of the `transformers` and `datasets` libraries from the tutorial to streamline content. * Maintained focus on essential details regarding model optimization and evaluation processes.

davidberenstein1957 force-pushed the docs/121-doc-update-end2end-llm-tutorial branch from f681344 to f8e0d35 Compare June 24, 2025 13:06

sdiazlor approved these changes Jun 24, 2025

View reviewed changes

davidberenstein1957 merged commit 458e009 into main Jun 24, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update LLM tutorial to optimize and evaluate large language models#126

docs: update LLM tutorial to optimize and evaluate large language models#126
davidberenstein1957 merged 16 commits intomainfrom
docs/121-doc-update-end2end-llm-tutorial

davidberenstein1957 commented May 16, 2025

Uh oh!

Uh oh!

sharpenb left a comment

Uh oh!

sharpenb left a comment

Uh oh!

davidberenstein1957 commented Jun 10, 2025

Uh oh!

sharpenb left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

davidberenstein1957 commented Jun 17, 2025 •

edited

Loading

Uh oh!

sharpenb left a comment

Uh oh!

nifleisch left a comment

Uh oh!

sharpenb left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

davidberenstein1957 commented May 16, 2025

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

davidberenstein1957 commented Jun 10, 2025

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

davidberenstein1957 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

nifleisch left a comment

Choose a reason for hiding this comment

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davidberenstein1957 commented Jun 17, 2025 •

edited

Loading