Find the right npm package — describe what you need, get AI-powered recommendations grounded in real registry data.
Searching npm is painful. npmatch lets you describe what you're trying to build in plain English, and returns ranked package recommendations with tradeoff explanations — powered by semantic search over real npm data and streamed LLM synthesis.
- 🔍 Semantic search — finds packages by meaning, not just keywords
- ⚡ Streaming UX — recommendations stream in token by token as they're generated
- 📊 Grounded results — LLM only recommends from retrieved real packages, no hallucination
- 🎯 Filter by framework and priorities — React vs Node, bundle size vs popularity vs TypeScript support
Browser
↓
Next.js API Route (/api/search) ← proxy layer, hides backend URL + secrets
↓
FastAPI backend
↓ ↓ ↓
Qdrant Postgres OpenAI
(vec) (metadata) (gpt-4o)
RAG pipeline:
- User query is embedded via
text-embedding-3-small - Qdrant returns top 6 semantically similar package names
- Postgres is joined for full metadata (description, keywords, version)
- Retrieved packages + query are passed to GPT-4o as context
- LLM synthesizes a recommendation — streamed back to the browser via SSE
The LLM never guesses from training memory. It only reasons over the retrieved packages, keeping recommendations verifiable and current.
| Layer | Tech |
|---|---|
| Frontend | Next.js 15, TypeScript, HeroUI v3, Tailwind CSS v4 |
| Backend | FastAPI, Python 3.13, uv |
| LLM | OpenAI GPT-4o (streaming) |
| Embeddings | OpenAI text-embedding-3-small |
| Vector DB | Qdrant |
| Metadata DB | Postgres (asyncpg) |
| Ingestion | Node.js, TypeScript |
| Infra | AWS ECS Fargate, ECR, ALB, Terraform, Vercel, Supabase, Qdrant Cloud, Docker |
| CI/CD | GitHub Actions |
npmatch/
├── README.md
├── docker-compose.yml # local full-stack dev
│
├── ingestion/ # Node.js — data pipeline
│ ├── src/
│ │ ├── fetch.ts # pulls top packages
│ │ ├── embed.ts # OpenAI embeddings
│ │ └── upsert.ts # pushes vectors to DB
│ └── Dockerfile
│
├── backend/ # FastAPI
│ ├── app/
│ │ ├── main.py # routes, middleware, CORS, rate limiting
│ │ ├── search.py # embed query + vector search
│ │ ├── llm.py # GPT-4o streaming + prompt construction
│ │ └── models.py # Pydantic request/response models
│ └── Dockerfile
│
├── frontend/ # Next.js
│ ├── app/
│ │ ├── page.tsx
│ │ └── api/
│ │ └── search/ # SSE proxy to backend
│ │ ├── route.ts
│ │ └── health/ # health check proxy
│ │ └── route.ts
│ ├── components/
│ │ ├── SearchForm.tsx
│ │ ├── PackageCard.tsx
│ │ ├── StatusStates.tsx
│ │ └── LlmPanel.tsx
│ └── hooks/
│ ├── useSearch.ts # SSE streaming logic
│ └── useHealthCheck.ts # backend health polling
│
└── infra/ # Terraform
├── main.tf
├── variables.tf
├── outputs.tf
└── modules/
├── ecs/
└── networking/
Streams package recommendations as SSE.
Request
{
"query": "parse markdown with syntax highlighting in React",
"framework": "react",
"priorities": ["bundle size", "TypeScript support"]
}SSE stream format
event: packages
data: [{"name": "...", "version": "...", "description": "...", "npm_url": "..."}]
data: chunk chunk chunk... ← LLM markdown, \n escaped as \\n
event: done
data: [DONE]
Returns backend status. Polled by frontend every 60s with animated signal indicator.
npm's search API is capped at 250 results — not enough for meaningful semantic search. Instead:
- Fetch — downloads the top 10,000 most popular npm packages from npm-rank as a JSON file
- Clean — filters out packages missing a name or description, deduplicates by package name, and strips irrelevant fields (author, sponsors, maintainers)
- Embed — formats each package as
"{name}: {description}. keywords: {keywords}"and batch-embeds via OpenAItext-embedding-3-small(batches of 100) - Upsert — pushes vectors into Qdrant (payload:
nameonly) and metadata (name, description, keywords, version) into Postgres. Idempotent — safe to re-run, Qdrant upserts overwrite by deterministic UUID, Postgres upserts useON CONFLICT (name) DO UPDATE
Vercel — Next.js frontend + FastAPI backend (serverless)
Qdrant Cloud — vector search (free tier)
Supabase — Postgres + pgvector (free tier)
The live demo runs entirely on free tiers — no ongoing infrastructure cost.
🔧 A self-hosted VPS backend (Oracle Cloud Always Free) is planned as an alternative to Vercel's serverless backend.
Terraform configuration in /infra provisions a production-grade AWS deployment:
ECR
npmatch-frontend
npmatch-backend
npmatch-ingestion
ECS Fargate
frontend service — behind ALB
backend service — behind ALB with HTTPS termination
qdrant service — internal, EFS for persistent storage
ingestion scheduled task — weekly via EventBridge
ALB
HTTPS termination
public + private subnets, security groups
💡 To spin up the full AWS deployment:
terraform applyin/infra. To tear it down:terraform destroy.
Prerequisites: Docker, Node.js 20+, Python 3.13+, OpenAI API key
# clone the repo
git clone https://github.com/kodingkin/npmatch
cd npmatch
# add environment variables
cp backend/.env.example backend/.env
# fill in OPENAI_API_KEY
cp frontend/.env.example frontend/.env
# start all services (frontend, backend, Qdrant, Postgres)
docker compose -f docker-compose.yml up -d --buildFrontend: http://localhost:3000
Backend: http://localhost:8000
cd ingestion
npm install
cp .env.example .env
# fill in OPENAI_API_KEY
tsx src/index.tsWhy RAG instead of asking GPT-4o directly? LLMs hallucinate package names and versions. By retrieving real packages from the vector database first and passing them as context, the LLM only reasons over verified data — recommendations are grounded and verifiable.
Why SSE over WebSockets? Streaming is one-directional (server → client). SSE is simpler, stateless, and works over standard HTTP — no connection management overhead.
Why Next.js API route as proxy? Keeps the backend URL off the client entirely. The browser never talks to FastAPI directly.
Why Qdrant + Postgres over a single vector store? Pinecone bundles vectors and metadata together — simple, but not how production systems are typically designed. Splitting vector search (Qdrant) from structured metadata (Postgres) reflects real-world architecture patterns and keeps each store doing what it does best. Postgres also enables hybrid search combining vector similarity with full-text search for improved retrieval quality.
Why text-embedding-3-small?
Good balance of semantic quality and cost at this scale. Upgrade path to text-embedding-3-large is a one-line change.
- Ingestion is a point-in-time snapshot — very new packages may not appear until the next weekly refresh
- No re-ranking step — production would add a cross-encoder re-ranker to improve retrieval precision
- No evaluation pipeline — answer faithfulness and retrieval quality are not measured automatically
- Rate limited to 2 requests/minute per IP
Built as a portfolio project demonstrating full-stack AI integration — RAG pipeline, streaming UX, and AWS infrastructure with Terraform.