Skip to content

mkarots/raglet

Repository files navigation

raglet logo

raglet

CI Smoke Tests Benchmarks PyPI version PyPI downloads Python 3.9+ License: MIT Code style: black Linting: Ruff uv Make

Portable memory for small text corpora. No servers, no API keys, no infrastructure.

There's a class of knowledge that's too big for a prompt but too small to justify a vector database: a codebase, a Slack export, a folder of meeting notes. raglet turns that text into a searchable directory you can save, git commit, or carry to another machine.

pip install raglet

How it works

from raglet import RAGlet

# Build a searchable index from your files
rag = RAGlet.from_files(["docs/", "notes.md"])

# Search semantically
results = rag.search("what did we decide about the API design?", top_k=5)

for chunk in results:
    print(f"[{chunk.score:.2f}] {chunk.source}")
    print(chunk.text)
    print()

# Save to a portable directory
rag.save(".raglet/")

Example output:

[0.87] docs/decisions/api-design.md
We decided to keep the API surface minimal — just search(), add_text(), and save().
The goal is that a new user can be productive in under 5 minutes.

[0.81] notes/2024-03-meeting.md
API design discussion: favour explicit save() calls over auto-persistence.
Incremental updates should be opt-in, not default behaviour.

[0.74] docs/decisions/api-design.md
The search() method returns ranked chunks with scores. The caller decides
what to do with them — raglet does not call any LLM.

Load it back anywhere:

rag = RAGlet.load(".raglet/")
results = rag.search("your query")

When to use raglet

raglet is designed for workspace-scale corpora. The embedding pipeline processes ~95K LLM tokens/sec on Apple Silicon (MPS). Build is a one-time cost — after that, search stays under 11 ms regardless of dataset size.

Corpus size Chunks ~Tokens Build time (MPS) Search p50 raglet?
< 8 KB < 20 < 5K Use a prompt directly
8 KB – 2 MB 20 – 2,800 5K – 700K < 7s 3–6 ms ✅ Sweet spot — builds in seconds
2 – 20 MB 2,800 – 28,000 700K – 7M 7s – 70s 6–7 ms ✅ Works great
20 – 100 MB 28,000 – 139,000 7M – 36M 70s – 6 min 7–11 ms ⚠️ Works — build is a one-time cost
> 100 MB > 139,000 > 36M > 6 min ❌ Use a vector database instead

If your corpus is larger than ~100 MB, raglet is the wrong tool. Use a persistent vector database (Chroma, Weaviate, Pinecone) instead.


The .raglet/ directory

When you save a raglet, you get a plain, inspectable directory:

.raglet/
├── config.json      # chunking, embedding model, search settings
├── chunks.json      # all text chunks with source and metadata
├── embeddings.npy   # NumPy float32 embeddings matrix
└── metadata.json    # version, timestamps, chunk count, dimensions

Everything is human-readable JSON (except the embeddings binary). That means you can:

# Inspect your chunks
cat .raglet/chunks.json

# Check what model and config was used
cat .raglet/config.json

# Git commit the whole thing
git add .raglet/ && git commit -m "update knowledge base"

# Export for sharing
raglet package --raglet .raglet/ --format zip --out knowledge.zip

No proprietary format. No lock-in. Your data is always accessible.


Installation

pip install raglet

Or with Docker — no install needed:

docker pull mkarots/raglet
docker run -v /path/to/project:/workspace mkarots/raglet build docs/ --out .raglet/

Note: Alpine Linux is not supported. Use python:3.11-slim or similar images.


CLI

# Build a knowledge base
raglet build docs/ --out .raglet/
raglet build docs/ src/ "*.md" --out .raglet/ --chunk-size 1024

# Search it
raglet query "how does authentication work?" --raglet .raglet/
raglet query "what is X?" --raglet memory.sqlite --top-k 10

# Add files, directories, or glob patterns incrementally
raglet add new_file.txt --raglet .raglet/
raglet add new-docs/ --raglet .raglet/
raglet add "*.md" --raglet .raglet/ --ignore __pycache__

# Convert between formats
raglet package --raglet .raglet/ --format zip --out export.zip
raglet package --raglet .raglet/ --format sqlite --out memory.sqlite

Storage formats

raglet supports three formats. All load with RAGlet.load() — format is auto-detected from the path.

Format Use when Incremental updates
.raglet/ directory Default — development, git-tracked knowledge bases
.sqlite Agent memory loops — frequent appends, single-file deployment ✅ True appends
.zip Export and sharing only ❌ Read-only
rag.save(".raglet/")          # directory (default)
rag.save("memory.sqlite")     # SQLite — true incremental appends
rag.save("export.zip")        # zip archive

rag = RAGlet.load(".raglet/")
rag = RAGlet.load("memory.sqlite")
rag = RAGlet.load("export.zip")

When to use SQLite: if you're running an agent loop that appends conversation turns or observations continuously, SQLite is the better choice — it does true SQL INSERT operations rather than rewriting files on each save.


Common patterns

Load or create

from pathlib import Path
from raglet import RAGlet

rag = RAGlet.load(".raglet/") if Path(".raglet/").exists() else RAGlet.from_files(["docs/"])

Use with any LLM

results = rag.search("user query", top_k=5)
context = "\n\n".join(chunk.text for chunk in results)

# Pass context to your LLM of choice
response = your_llm.generate(f"Context:\n{context}\n\nQuestion: {query}")

raglet handles retrieval. You handle generation.

Agent loop with persistent memory

from pathlib import Path
from raglet import RAGlet

# SQLite is the right format for agent memory — true incremental appends
path = "memory.sqlite"
rag = RAGlet.load(path) if Path(path).exists() else RAGlet.from_files([])

while True:
    query = input("You: ")
    if query == "exit":
        rag.save(path)
        break

    results = rag.search(query, top_k=5)
    response = your_llm(results, query)

    rag.add_text(query, source="user")
    rag.add_text(response, source="assistant")
    rag.save(path, incremental=True)

Incremental updates (cheap appends)

The initial from_files() is the expensive step — it embeds all the text. After that, appending new content only embeds the new chunks. A 100 KB file appends in ~0.3s regardless of how large the existing raglet is.

# Add files, directories, or glob patterns
rag.add_file("new_doc.txt")
rag.add_files(["file1.txt", "file2.md"])
rag.add_files(["new-docs/"])

# Add raw text
rag.add_text("Some text", source="manual")

# Save incrementally (only writes new data)
rag.save(".raglet/", incremental=True)

See Usage Patterns for the full build-once-append-search workflow.


Configuration

from raglet import RAGlet, RAGletConfig

config = RAGletConfig()
config.chunking.size = 1024
config.chunking.overlap = 100
config.embedding.model = "all-mpnet-base-v2"

rag = RAGlet.from_files(["docs/"], config=config)

Available embedding models: all-MiniLM-L6-v2 (default, fast), all-mpnet-base-v2 (higher quality), BAAI/bge-small-en-v1.5.

Search with a similarity threshold:

results = rag.search("query", top_k=10, similarity_threshold=0.7)

Known limitations

File formats: v0.1.0 supports .txt and .md files only. PDF, DOCX, and HTML are on the roadmap. For unsupported formats, extract text first and use add_text().

Corpus size: raglet is workspace-scale, not internet-scale. Search stays under 11 ms up to 100 MB (measured: 10.4 ms p50 at 139K chunks), but build time scales linearly (100 MB takes ~6 minutes on MPS). Above ~100 MB, use a proper vector database.

No file change detection: raglet does not watch for file changes. If a file is modified, rebuild from scratch with from_files(). Incremental updates (add_file, add_files) are for adding new files only.

CPU-only machines: embedding is ~10–20× slower without a GPU or MPS. Search latency (<10 ms) is hardware-independent and unaffected.


Features

  • ✅ Text extraction from .txt and .md files
  • ✅ Sentence-aware chunking
  • ✅ Local embeddings via sentence-transformers (no API keys)
  • ✅ Vector search via FAISS
  • ✅ Three portable formats: directory, SQLite, zip
  • ✅ Incremental updates
  • ✅ CLI — build, query, add (files, directories, globs), package
  • ✅ Docker image

Principles

Portable — One directory (or file). Git commit it, email it, load it on another machine.

Small by design — Workspace-scale: codebases, conversations, notes. Not the internet.

Retrieval only — raglet finds chunks. You decide what to do with them. Bring your own LLM.

Open format — JSON files you can read, edit, and extract. No proprietary format, no lock-in.

Zero infrastructurepip install raglet or docker run. That's it.


Roadmap

v0.1.0 (current) — Semantic search, save/load, incremental updates, CLI

v0.2.0 — PDF, DOCX, HTML extraction

v0.3.0 — File change detection (rebuild only modified files)

Planned (unscheduled)

  • Semantic chunking — split on topic boundaries using embeddings, not just sentence boundaries
  • Metadata filtering — rag.search("query", source="docs/") to narrow results by directory or file
  • .ragletignore — project-level ignore file alongside the --ignore CLI flag
  • JSON output for raglet query — pipe results to other tools
  • ONNX runtime — lightweight inference without PyTorch for smaller installs and faster cold starts
  • Workspace limits enforcement — soft/hard chunk count limits with actionable error messages (ADR 010)

Not planned (out of scope by design)

  • LLM integration — raglet is retrieval only; bring your own LLM
  • Cloud/API backends — everything runs locally
  • Real-time file watching — use add_file() or rebuild explicitly
  • Datasets larger than ~100 MB — use a vector database instead

Development

# Install with uv
curl -LsSf https://astral.sh/uv/install.sh | sh
make install-dev

# Run tests
make test           # all tests
make test-unit      # unit only
make test-e2e       # end-to-end only

# Code quality
make lint
make format
make type-check
make ci             # full pipeline

Architecture

raglet/
├── core/           # domain models and orchestrator
├── processing/     # document extraction and chunking
├── embeddings/     # embedding generation
├── vector_store/   # vector storage and search
├── storage/        # file serialization (dir / sqlite / zip)
└── config/         # configuration system

See docs/proposals/ARCHITECTURE.md for design decisions.


Documentation


License

MIT

About

tiny memories

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors