Portable memory for small text corpora. No servers, no API keys, no infrastructure.
There's a class of knowledge that's too big for a prompt but too small to justify a vector database: a codebase, a Slack export, a folder of meeting notes. raglet turns that text into a searchable directory you can save, git commit, or carry to another machine.
pip install ragletfrom raglet import RAGlet
# Build a searchable index from your files
rag = RAGlet.from_files(["docs/", "notes.md"])
# Search semantically
results = rag.search("what did we decide about the API design?", top_k=5)
for chunk in results:
print(f"[{chunk.score:.2f}] {chunk.source}")
print(chunk.text)
print()
# Save to a portable directory
rag.save(".raglet/")Example output:
[0.87] docs/decisions/api-design.md
We decided to keep the API surface minimal — just search(), add_text(), and save().
The goal is that a new user can be productive in under 5 minutes.
[0.81] notes/2024-03-meeting.md
API design discussion: favour explicit save() calls over auto-persistence.
Incremental updates should be opt-in, not default behaviour.
[0.74] docs/decisions/api-design.md
The search() method returns ranked chunks with scores. The caller decides
what to do with them — raglet does not call any LLM.
Load it back anywhere:
rag = RAGlet.load(".raglet/")
results = rag.search("your query")raglet is designed for workspace-scale corpora. The embedding pipeline processes ~95K LLM tokens/sec on Apple Silicon (MPS). Build is a one-time cost — after that, search stays under 11 ms regardless of dataset size.
| Corpus size | Chunks | ~Tokens | Build time (MPS) | Search p50 | raglet? |
|---|---|---|---|---|---|
| < 8 KB | < 20 | < 5K | — | — | Use a prompt directly |
| 8 KB – 2 MB | 20 – 2,800 | 5K – 700K | < 7s | 3–6 ms | ✅ Sweet spot — builds in seconds |
| 2 – 20 MB | 2,800 – 28,000 | 700K – 7M | 7s – 70s | 6–7 ms | ✅ Works great |
| 20 – 100 MB | 28,000 – 139,000 | 7M – 36M | 70s – 6 min | 7–11 ms | |
| > 100 MB | > 139,000 | > 36M | > 6 min | — | ❌ Use a vector database instead |
If your corpus is larger than ~100 MB, raglet is the wrong tool. Use a persistent vector database (Chroma, Weaviate, Pinecone) instead.
When you save a raglet, you get a plain, inspectable directory:
.raglet/
├── config.json # chunking, embedding model, search settings
├── chunks.json # all text chunks with source and metadata
├── embeddings.npy # NumPy float32 embeddings matrix
└── metadata.json # version, timestamps, chunk count, dimensions
Everything is human-readable JSON (except the embeddings binary). That means you can:
# Inspect your chunks
cat .raglet/chunks.json
# Check what model and config was used
cat .raglet/config.json
# Git commit the whole thing
git add .raglet/ && git commit -m "update knowledge base"
# Export for sharing
raglet package --raglet .raglet/ --format zip --out knowledge.zipNo proprietary format. No lock-in. Your data is always accessible.
pip install ragletOr with Docker — no install needed:
docker pull mkarots/raglet
docker run -v /path/to/project:/workspace mkarots/raglet build docs/ --out .raglet/Note: Alpine Linux is not supported. Use
python:3.11-slimor similar images.
# Build a knowledge base
raglet build docs/ --out .raglet/
raglet build docs/ src/ "*.md" --out .raglet/ --chunk-size 1024
# Search it
raglet query "how does authentication work?" --raglet .raglet/
raglet query "what is X?" --raglet memory.sqlite --top-k 10
# Add files, directories, or glob patterns incrementally
raglet add new_file.txt --raglet .raglet/
raglet add new-docs/ --raglet .raglet/
raglet add "*.md" --raglet .raglet/ --ignore __pycache__
# Convert between formats
raglet package --raglet .raglet/ --format zip --out export.zip
raglet package --raglet .raglet/ --format sqlite --out memory.sqliteraglet supports three formats. All load with RAGlet.load() — format is auto-detected from the path.
| Format | Use when | Incremental updates |
|---|---|---|
.raglet/ directory |
Default — development, git-tracked knowledge bases | ✅ |
.sqlite |
Agent memory loops — frequent appends, single-file deployment | ✅ True appends |
.zip |
Export and sharing only | ❌ Read-only |
rag.save(".raglet/") # directory (default)
rag.save("memory.sqlite") # SQLite — true incremental appends
rag.save("export.zip") # zip archive
rag = RAGlet.load(".raglet/")
rag = RAGlet.load("memory.sqlite")
rag = RAGlet.load("export.zip")When to use SQLite: if you're running an agent loop that appends conversation turns or observations continuously, SQLite is the better choice — it does true SQL INSERT operations rather than rewriting files on each save.
from pathlib import Path
from raglet import RAGlet
rag = RAGlet.load(".raglet/") if Path(".raglet/").exists() else RAGlet.from_files(["docs/"])results = rag.search("user query", top_k=5)
context = "\n\n".join(chunk.text for chunk in results)
# Pass context to your LLM of choice
response = your_llm.generate(f"Context:\n{context}\n\nQuestion: {query}")raglet handles retrieval. You handle generation.
from pathlib import Path
from raglet import RAGlet
# SQLite is the right format for agent memory — true incremental appends
path = "memory.sqlite"
rag = RAGlet.load(path) if Path(path).exists() else RAGlet.from_files([])
while True:
query = input("You: ")
if query == "exit":
rag.save(path)
break
results = rag.search(query, top_k=5)
response = your_llm(results, query)
rag.add_text(query, source="user")
rag.add_text(response, source="assistant")
rag.save(path, incremental=True)The initial from_files() is the expensive step — it embeds all the text. After that, appending new content only embeds the new chunks. A 100 KB file appends in ~0.3s regardless of how large the existing raglet is.
# Add files, directories, or glob patterns
rag.add_file("new_doc.txt")
rag.add_files(["file1.txt", "file2.md"])
rag.add_files(["new-docs/"])
# Add raw text
rag.add_text("Some text", source="manual")
# Save incrementally (only writes new data)
rag.save(".raglet/", incremental=True)See Usage Patterns for the full build-once-append-search workflow.
from raglet import RAGlet, RAGletConfig
config = RAGletConfig()
config.chunking.size = 1024
config.chunking.overlap = 100
config.embedding.model = "all-mpnet-base-v2"
rag = RAGlet.from_files(["docs/"], config=config)Available embedding models: all-MiniLM-L6-v2 (default, fast), all-mpnet-base-v2 (higher quality), BAAI/bge-small-en-v1.5.
Search with a similarity threshold:
results = rag.search("query", top_k=10, similarity_threshold=0.7)File formats: v0.1.0 supports .txt and .md files only. PDF, DOCX, and HTML are on the roadmap. For unsupported formats, extract text first and use add_text().
Corpus size: raglet is workspace-scale, not internet-scale. Search stays under 11 ms up to 100 MB (measured: 10.4 ms p50 at 139K chunks), but build time scales linearly (100 MB takes ~6 minutes on MPS). Above ~100 MB, use a proper vector database.
No file change detection: raglet does not watch for file changes. If a file is modified, rebuild from scratch with from_files(). Incremental updates (add_file, add_files) are for adding new files only.
CPU-only machines: embedding is ~10–20× slower without a GPU or MPS. Search latency (<10 ms) is hardware-independent and unaffected.
- ✅ Text extraction from
.txtand.mdfiles - ✅ Sentence-aware chunking
- ✅ Local embeddings via sentence-transformers (no API keys)
- ✅ Vector search via FAISS
- ✅ Three portable formats: directory, SQLite, zip
- ✅ Incremental updates
- ✅ CLI —
build,query,add(files, directories, globs),package - ✅ Docker image
Portable — One directory (or file). Git commit it, email it, load it on another machine.
Small by design — Workspace-scale: codebases, conversations, notes. Not the internet.
Retrieval only — raglet finds chunks. You decide what to do with them. Bring your own LLM.
Open format — JSON files you can read, edit, and extract. No proprietary format, no lock-in.
Zero infrastructure — pip install raglet or docker run. That's it.
v0.1.0 (current) — Semantic search, save/load, incremental updates, CLI
v0.2.0 — PDF, DOCX, HTML extraction
v0.3.0 — File change detection (rebuild only modified files)
Planned (unscheduled)
- Semantic chunking — split on topic boundaries using embeddings, not just sentence boundaries
- Metadata filtering —
rag.search("query", source="docs/")to narrow results by directory or file .ragletignore— project-level ignore file alongside the--ignoreCLI flag- JSON output for
raglet query— pipe results to other tools - ONNX runtime — lightweight inference without PyTorch for smaller installs and faster cold starts
- Workspace limits enforcement — soft/hard chunk count limits with actionable error messages (ADR 010)
Not planned (out of scope by design)
- LLM integration — raglet is retrieval only; bring your own LLM
- Cloud/API backends — everything runs locally
- Real-time file watching — use
add_file()or rebuild explicitly - Datasets larger than ~100 MB — use a vector database instead
# Install with uv
curl -LsSf https://astral.sh/uv/install.sh | sh
make install-dev
# Run tests
make test # all tests
make test-unit # unit only
make test-e2e # end-to-end only
# Code quality
make lint
make format
make type-check
make ci # full pipelineraglet/
├── core/ # domain models and orchestrator
├── processing/ # document extraction and chunking
├── embeddings/ # embedding generation
├── vector_store/ # vector storage and search
├── storage/ # file serialization (dir / sqlite / zip)
└── config/ # configuration system
See docs/proposals/ARCHITECTURE.md for design decisions.
MIT
