This repo is a guide to on how to structure genAI experiments, with a particular focus on the thought process and decisions to be made when selecting a chunking strategy.
This repo is for software engineers who are starting out building generative AI applications, in particular Retrieval Augmented Generation systems. If you've built the "hello, world!" RAG apps already, and are wondering how to improve the system performance to win over your users.
Like with most data probles, there's no single optimal configuration that will work for all datasets. This guide aims to outline how to experiment with your chunking strategies, and identify the levers you can pull, and how to measure performance.
We restrict the scope to chunking methodologies, but can expand to other aspects of RAG if there is enough demand for it.
- Azure Open AI resource
- Deployments of:
- an embedding model
- an LLM (suggest using gpt-4 for Q&A generation, and 35-turbo-16k elsewhere)
- Python 3.10 onwards (tested on 3.11)
- Port 8080 available - MLFlow runs against port 8080 by default. If this is an issue you can follow these steps:
- Change the forwarded port in the devcontainer.json and (re)build
- Update the
!mlflow server --host 127.0.0.1 --port 8080in each notebook to reflect your port of choice
- You can either use the requirements.txt and you're environment management tool of choice (conda, mamba, venv etc.)
- Or use the devcontainer :)
- Create a .env file using the sample file
- Run and follow along with 00-Chunking Strategies.ipynb
- Run and follow along with 01-Baseline Strategy.ipynb
- Run and follow along with 02-Recursive Chunking.ipynb
- Run and follow along with 03-Semantic Chunking.ipynb
- View the results in MLFlow (launched from within the notebooks)
NOTE: Given the nature of RAG, running these from scratch can take some time and can be resource intensive. Some example outputs have been provided throughout in the chunking_Strategies/data folder to allow for quick exploration. Feel free to update the parameters at the top of the experiment notebooks and / or use different models in your .env file to try running your own experiment and compare the results
- An approach to experimentation that can be used for any data / ML problem (see experiments)
- An overview of 2 popular chunking strategies and a comparisson (that is only valid for the data used in this experiment!)
- A comprehensive overview of the decisions, and influencing factors when deciding on a chunking strategy
- Pre generated evaluation data, and experiment results
- Python code that is not suitable for production!