Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,12 @@ Pruna is a model optimization framework built for developers, enabling you to de

The toolkit is designed with simplicity in mind - requiring just a few lines of code to optimize your models. It supports various model types including LLMs, Diffusion and Flow Matching Models, Vision Transformers, Speech Recognition Models and more.


<!--
<img align="left" width="40" src="docs/assets/images/highlight.png" alt="Pruna Pro"/>

**To move at top speed**, we offer [Pruna Pro](https://docs.pruna.ai/en/stable/docs_pruna_pro/user_manual/pruna_pro.html), our enterprise solution that unlocks advanced optimization features, our `OptimizationAgent`, priority support, and much more.
<br clear="left"/>

-->

## <img src="./docs/assets/images/pruna_cool.png" alt="Pruna Cool" width=20></img> Installation

Expand Down Expand Up @@ -126,7 +126,7 @@ eval_agent.evaluate(smashed_model)
```

This was the minimal example, but you are looking for the maximal example? You can check out our [documentation][documentation] for an overview of all supported [algorithms][docs-algorithms] as well as our tutorials for more use-cases and examples.

<!--
## <img src="./docs/assets/images/pruna_heart.png" alt="Pruna Heart" width=20></img> Pruna Pro

Pruna has everything you need to get started on optimizing your own models. To push the efficiency of your models even further, we offer Pruna Pro. To give you a glimpse of what is possible with Pruna Pro, let us consider three of the most widely used diffusers pipelines and see how much smaller and faster we can make them. In addition to popular open-source algorithms, we use our proprietary Auto Caching algorithm. We compare the fidelity of the compressed models. Fidelity measures the similarity between the images of the compressed models and the images of the original model.
Expand All @@ -147,7 +147,7 @@ For [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo), we compare Auto

<img src="./docs/assets/plots/benchmark_hunyuan.svg" alt="HunyuanVideo Benchmark"/>


-->

## <img src="./docs/assets/images/pruna_cool.png" alt="Pruna Cool" width=20></img> Algorithm Overview

Expand All @@ -158,13 +158,9 @@ Since Pruna offers a broad range of optimization algorithms, the following table
| `batcher` | Groups multiple inputs together to be processed simultaneously, improving computational efficiency and reducing processing time. | ✅ | ❌ | ➖ |
| `cacher` | Stores intermediate results of computations to speed up subsequent operations. | ✅ | ➖ | ➖ |
| `compiler` | Optimises the model with instructions for specific hardware. | ✅ | ➖ | ➖ |
| `distiller` | Trains a smaller, simpler model to mimic a larger, more complex model. | ✅ | ✅ | ❌ |
| `quantizer` | Reduces the precision of weights and activations, lowering memory requirements. | ✅ | ✅ | ❌ |
| `pruner` | Removes less important or redundant connections and neurons, resulting in a sparser, more efficient network. | ✅ | ✅ | ❌ |
| `recoverer` | Restores the performance of a model after compression. | ➖ | ➖ | ✅ |
| `factorizer` | Factorization batches several small matrix multiplications into one large fused operation. | ✅ | ➖ | ➖ |
| `enhancer` | Enhances the model output by applying post-processing algorithms such as denoising or upscaling. | ❌ | ➖ | ✅ |
| `distributer` | Distributes the inference, the model or certain calculations across multiple devices. | ✅ | ❌ | ➖ |
| `kernel` | Kernels are specialized GPU routines that speed up parts of the computation. | ✅ | ➖ | ➖ |

✅ (improves), ➖ (approx. the same), ❌ (worsens)
Expand Down
Loading