Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
525a505
🚰 feat: Stream Document Embeddings to Database in Batches (#214)
MarcAmick Dec 10, 2025
95db2e1
📦 chore: Resolve Package Advisories (#220)
danny-avila Dec 10, 2025
1d3a505
🔧 refactor: Document Processing and Health Check Functions (#221)
danny-avila Dec 10, 2025
1dc2d9a
✨ chore: Enhance GitHub Actions Workflow with Disk Space Management
danny-avila Dec 11, 2025
fa0abea
📋 docs: MongoDB file_id index recommendation (#219)
kazuya-awano Dec 16, 2025
1d6ef08
📦 chore: Bump `langchain-core` to v0.3.81 (#225)
loganaden Dec 29, 2025
d89da49
🪤 fix: Patch Path Traversal Vulnerabilities in File Embedding Endpoin…
Marshall-Hallenbeck Feb 27, 2026
d776d22
📦 chore: update dependabot packages (#255)
danny-avila Feb 27, 2026
d8f640d
📜 feat: Add setup script for venv creation
danny-avila Feb 27, 2026
2fa9600
🩺 fix: Concurrent Upload Isolation, MongoDB Connection Leaks, and Val…
danny-avila Mar 1, 2026
719ac3d
🦥 feat: Add lazy_load() support to document loaders (#257)
danny-avila Mar 1, 2026
57dbd2c
🚰 fix: Plug MongoDB client, SQLAlchemy engine, and CSV temp file leak…
danny-avila Mar 1, 2026
8c1ed83
🐘 fix: Migrate PGVector cmetadata Column to JSONB with GIN Index (#259)
danny-avila Mar 1, 2026
11c35e4
⛓️‍💥 chore: Upgrade LangChain to 1.x and Consolidate Google AI Provid…
danny-avila Mar 1, 2026
b3a785f
📦 chore: Bump `pypdf` to v6.7.4 (#261)
danny-avila Mar 1, 2026
3daa23d
fix: add missing averaged_perceptron_tagger_eng NLTK package (#265)
MieszkoMakuch Mar 16, 2026
2f6e6e9
📦 chore: Update `pypdf` to v6.9.1 and `PyJWT` to v2.12.1 (#267)
danny-avila Mar 20, 2026
e30e4e3
📜 fix: add missing `msoffcrypto-tool` package for xlsx file support (…
ABHIJITH-EA Mar 20, 2026
9938ee6
🍃 fix: Resolve MongoDB Atlas Document Upload Failures and Cross-Batch…
mfish911 Mar 20, 2026
85ab155
Merge main into upstream/rag_api_merge_v0.7.3
paychex-joser Apr 15, 2026
ba60339
Update PaychexDockerfile with upstream changes
paychex-joser Apr 15, 2026
9da7074
Adding docker-compose.override to .gitignore
paychex-joser Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,18 @@ jobs:
image_name: librechat-rag-api-dev-lite

steps:
# Free up disk space
- name: Free Disk Space
uses: jlumbroso/free-disk-space@main
with:
tool-cache: true
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: true

# Check out the repository
- name: Checkout
uses: actions/checkout@v4
Expand Down Expand Up @@ -57,3 +69,5 @@ jobs:
ghcr.io/${{ github.repository_owner }}/${{ matrix.image_name }}:latest
platforms: linux/amd64,linux/arm64
target: ${{ matrix.target }}
cache-from: type=gha
cache-to: type=gha,mode=max
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@ venv/
*.pyc
dev.yml
SHOPIFY.md

# docker override file
docker-compose.override.yaml
docker-compose.override.yml
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Download standard NLTK data, to prevent unstructured from downloading packages at runtime
RUN python -m nltk.downloader -d /app/nltk_data punkt_tab averaged_perceptron_tagger
RUN python -m nltk.downloader -d /app/nltk_data punkt_tab averaged_perceptron_tagger averaged_perceptron_tagger_eng
ENV NLTK_DATA=/app/nltk_data

# Disable Unstructured analytics
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.lite
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ COPY requirements.lite.txt .
RUN pip install --no-cache-dir -r requirements.lite.txt

# Download standard NLTK data, to prevent unstructured from downloading packages at runtime
RUN python -m nltk.downloader -d /app/nltk_data punkt_tab averaged_perceptron_tagger
RUN python -m nltk.downloader -d /app/nltk_data punkt_tab averaged_perceptron_tagger averaged_perceptron_tagger_eng
ENV NLTK_DATA=/app/nltk_data

# Disable Unstructured analytics
Expand Down
4 changes: 1 addition & 3 deletions PaychexDockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ FROM python:3.12-slim AS main

WORKDIR /app

WORKDIR /app

# Install pandoc and netcat
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
Expand All @@ -17,7 +15,7 @@ COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Download standard NLTK data, to prevent unstructured from downloading packages at runtime
RUN python -m nltk.downloader -d /app/nltk_data punkt_tab averaged_perceptron_tagger
RUN python -m nltk.downloader -d /app/nltk_data punkt_tab averaged_perceptron_tagger averaged_perceptron_tagger_eng
ENV NLTK_DATA=/app/nltk_data

# Disable Unstructured analytics
Expand Down
155 changes: 153 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,33 @@ pip install -r requirements.txt
uvicorn main:app
```

### Clean Install (Local Development)

To do a clean reinstall of all dependencies (e.g., after updating `requirements.txt`):

```bash
# Remove existing virtual environment and recreate it
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

For the lite version (without sentence_transformers/huggingface):

```bash
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.lite.txt
```

For Docker, rebuild without cache:

```bash
docker compose build --no-cache
```

### Environment Variables

The following environment variables are required to run the application:
Expand All @@ -59,6 +86,8 @@ The following environment variables are required to run the application:
- `COLLECTION_NAME`: (Optional) The name of the collection in the vector store. Default value is "testcollection".
- `CHUNK_SIZE`: (Optional) The size of the chunks for text processing. Default value is "1500".
- `CHUNK_OVERLAP`: (Optional) The overlap between chunks during text processing. Default value is "100".
- `EMBEDDING_BATCH_SIZE`: (Optional) Number of document chunks to process per batch. Set to `0` (default) to disable batching. Recommended value is `750` for `text-embedding-3-small`.
- `EMBEDDING_MAX_QUEUE_SIZE`: (Optional) Maximum number of batches to buffer in memory during async processing. Default value is "3".
- `RAG_UPLOAD_DIR`: (Optional) The directory where uploaded files are stored. Default value is "./uploads/".
- `PDF_EXTRACT_IMAGES`: (Optional) A boolean value indicating whether to extract images from PDF files. Default value is "False".
- `DEBUG_RAG_API`: (Optional) Set to "True" to show more verbose logging output in the server console, and to enable postgresql database routes
Expand All @@ -71,7 +100,7 @@ The following environment variables are required to run the application:
- azure: "text-embedding-3-small" (will be used as your Azure Deployment)
- huggingface: "sentence-transformers/all-MiniLM-L6-v2"
- huggingfacetei: "http://huggingfacetei:3000". Hugging Face TEI uses model defined on TEI service launch.
- vertexai: "text-embedding-004"
- vertexai: "gemini-embedding-001"
- ollama: "nomic-embed-text"
- bedrock: "amazon.titan-embed-text-v1"
- google_genai: "gemini-embedding-001"
Expand All @@ -90,11 +119,48 @@ The following environment variables are required to run the application:
- `AWS_SECRET_ACCESS_KEY`: (Optional) needed for bedrock embeddings
- `GOOGLE_API_KEY`, `GOOGLE_KEY`, `RAG_GOOGLE_API_KEY`: (Optional) Google API key for Google GenAI embeddings. Priority order: RAG_GOOGLE_API_KEY > GOOGLE_KEY > GOOGLE_API_KEY
- `AWS_SESSION_TOKEN`: (Optional) may be needed for bedrock embeddings
- `GOOGLE_APPLICATION_CREDENTIALS`: (Optional) needed for Google VertexAI embeddings. This should be a path to a service account credential file in JSON format, as accepted by [langchain](https://python.langchain.com/api_reference/google_vertexai/index.html)
- `GOOGLE_APPLICATION_CREDENTIALS`: (Optional) needed for Google VertexAI embeddings. This should be a path to a service account credential file in JSON format.
- `GOOGLE_CLOUD_PROJECT`: (Optional) Google Cloud project ID, needed for VertexAI embeddings.
- `GOOGLE_CLOUD_LOCATION`: (Optional) Google Cloud region for VertexAI embeddings. Defaults to `us-central1`.
- `RAG_CHECK_EMBEDDING_CTX_LENGTH` (Optional) Default is true, disabling this will send raw input to the embedder, use this for custom embedding models.

Make sure to set these environment variables before running the application. You can set them in a `.env` file or as system environment variables.

### Embedding Batch Processing

For large files, you can enable batched embedding processing to reduce memory consumption. This is particularly useful in memory-constrained environments like Kubernetes pods with memory limits.

#### Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDDING_BATCH_SIZE` | `0` | Number of document chunks to process per batch. `0` disables batching (original behavior). |
| `EMBEDDING_MAX_QUEUE_SIZE` | `3` | Maximum number of batches to buffer in memory during async processing. |

#### Recommended Settings

For `text-embedding-3-small` model:
- `EMBEDDING_BATCH_SIZE=750` - Good balance of throughput and memory

For memory-constrained environments (< 2GB RAM):
- `EMBEDDING_BATCH_SIZE=100-250`

For high-throughput environments:
- `EMBEDDING_BATCH_SIZE=1000-2000`
- `EMBEDDING_MAX_QUEUE_SIZE=5`

#### Behavior

When `EMBEDDING_BATCH_SIZE > 0`:
- Documents are processed in batches of the specified size
- Each batch is embedded and inserted before the next batch starts
- On failure, successfully inserted documents are rolled back
- Memory usage is bounded by `EMBEDDING_BATCH_SIZE * EMBEDDING_MAX_QUEUE_SIZE`

When `EMBEDDING_BATCH_SIZE = 0` (default):
- All documents are processed at once (original behavior)
- Better for small files or memory-rich environments

### Use Atlas MongoDB as Vector Database

Instead of using the default pgvector, we could use [Atlas MongoDB](https://www.mongodb.com/products/platform/atlas-vector-search) as the vector database. To do so, set the following environment variables
Expand Down Expand Up @@ -127,6 +193,16 @@ The `ATLAS_MONGO_DB_URI` could be the same or different from what is used by Lib

Follow one of the [four documented methods](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#procedure) to create the vector index.

#### Create a `file_id` Index (recommended)

We recommend creating a standard MongoDB index on `file_id` to keep lookups fast. After creating the collection, run the following once (via Atlas UI, Compass, or `mongosh`):

```javascript
db.getCollection("<COLLECTION_NAME>").createIndex({ file_id: 1 })
```

Replace `<COLLECTION_NAME>` with the same collection used by the RAG API. This ensures lookups remain fast even as the number of embedded documents grows.


### Proxy Configuration

Expand Down Expand Up @@ -169,6 +245,81 @@ Notes:

### Dev notes:

#### Running Tests

##### Prerequisites

Install test dependencies:

```bash
pip install -r test_requirements.txt
```

##### Running All Tests

```bash
# Run all tests
pytest

# Run with verbose output
pytest -v

# Run with coverage (if pytest-cov is installed)
pytest --cov=app
```

##### Running Specific Test Files

```bash
# Run batch processing unit tests
pytest tests/test_batch_processing.py -v

# Run batch processing integration tests (memory optimization tests)
pytest tests/test_batch_processing_integration.py -v

# Run main API tests
pytest tests/test_main.py -v
```

##### Running Tests by Category

```bash
# Run only integration tests (marked with @pytest.mark.integration)
pytest -m integration -v

# Skip integration tests
pytest -m "not integration" -v

# Run only async tests
pytest -k "async" -v
```

##### Test Categories

| Test File | Description |
|-----------|-------------|
| `test_batch_processing.py` | Unit tests for batch processing functions |
| `test_batch_processing_integration.py` | Memory optimization and integration tests |
| `test_main.py` | API endpoint tests |
| `test_config.py` | Configuration tests |
| `test_middleware.py` | Middleware tests |
| `test_models.py` | Model tests |

##### Memory Optimization Tests

The `test_batch_processing_integration.py` file includes tests that verify the memory optimization behavior:

- **`test_memory_bounded_by_batch_size`**: Verifies that the number of documents in memory at any time is bounded by `EMBEDDING_BATCH_SIZE`
- **`test_memory_tracking_with_tracemalloc`**: Uses Python's `tracemalloc` to monitor memory usage during batch processing
- **`test_sync_memory_bounded_by_batch_size`**: Same verification for the synchronous code path

Run memory tests specifically:

```bash
pytest tests/test_batch_processing_integration.py::TestMemoryOptimization -v
pytest tests/test_batch_processing_integration.py::TestSyncBatchedMemory -v
```

#### Installing pre-commit formatter

Run the following commands to install pre-commit formatter, which uses [black](https://github.com/psf/black) code formatter:
Expand Down
31 changes: 27 additions & 4 deletions app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,23 @@ def get_env_variable(
CHUNK_SIZE = int(get_env_variable("CHUNK_SIZE", "1500"))
CHUNK_OVERLAP = int(get_env_variable("CHUNK_OVERLAP", "100"))

# Batch processing configuration for memory-constrained environments.
# When EMBEDDING_BATCH_SIZE > 0, documents are processed in batches to reduce
# peak memory usage. This is useful for Kubernetes pods with memory limits.
#
# Trade-offs:
# - Smaller batch size = lower memory, more DB round trips
# - Larger batch size = higher memory, fewer DB round trips
# - 0 = disable batching, process all at once
#
# Default of 500 is conservative and works well for most embedding providers.
# Increase to 750 for higher throughput at the cost of higher peak memory.
EMBEDDING_BATCH_SIZE = int(get_env_variable("EMBEDDING_BATCH_SIZE", "500"))

# Maximum number of batches to buffer in memory during async processing.
# Higher values allow more parallelism but use more memory.
EMBEDDING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDDING_MAX_QUEUE_SIZE", "3"))

env_value = get_env_variable("PDF_EXTRACT_IMAGES", "False").lower()
PDF_EXTRACT_IMAGES = True if env_value == "true" else False

Expand Down Expand Up @@ -241,12 +258,18 @@ def init_embeddings(provider, model):

return GoogleGenerativeAIEmbeddings(
model=model,
google_api_key=RAG_GOOGLE_API_KEY,
google_api_key=RAG_GOOGLE_API_KEY or None,
)
elif provider == EmbeddingsProvider.GOOGLE_VERTEXAI:
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings

return VertexAIEmbeddings(model=model)
return GoogleGenerativeAIEmbeddings(
model=model,
google_api_key=RAG_GOOGLE_API_KEY or None,
vertexai=True,
project=get_env_variable("GOOGLE_CLOUD_PROJECT", None),
location=get_env_variable("GOOGLE_CLOUD_LOCATION", "us-central1"),
)
elif provider == EmbeddingsProvider.BEDROCK:
from langchain_aws import BedrockEmbeddings

Expand Down Expand Up @@ -290,7 +313,7 @@ def init_embeddings(provider, model):
"EMBEDDINGS_MODEL", "http://huggingfacetei:3000"
)
elif EMBEDDINGS_PROVIDER == EmbeddingsProvider.GOOGLE_VERTEXAI:
EMBEDDINGS_MODEL = get_env_variable("EMBEDDINGS_MODEL", "text-embedding-004")
EMBEDDINGS_MODEL = get_env_variable("EMBEDDINGS_MODEL", "gemini-embedding-001")
elif EMBEDDINGS_PROVIDER == EmbeddingsProvider.OLLAMA:
EMBEDDINGS_MODEL = get_env_variable("EMBEDDINGS_MODEL", "nomic-embed-text")
elif EMBEDDINGS_PROVIDER == EmbeddingsProvider.GOOGLE_GENAI:
Expand Down
Loading
Loading