sanjaychelliah sanjaychelliah

Senior ML Engineer · LLM Infrastructure · AI Platform

🧠 About Me

Building the infrastructure that makes AI run fast, cheap, and reliably at scale.

I'm a Senior MLOps / AI Platform Engineer with 5+ years shipping production systems across two deep specializations:

🚀 LLM Inference Infrastructure — vLLM, SGLang, MCP-based agents, RAG architectures.
👁️ Computer Vision Pipelines — Real-time object detection, multi-object tracking, segmentation at millions of frames per week.

I've led teams of 4+ engineers, contributed to open-source SDKs increasing downloads by 10x, and pushed models to top throughput performance.

🔭 Currently: MCP-based agent orchestration and pushing LLM inference latency boundaries with vLLM + SGLang.

⚡ What I've Shipped

I've taken systems from prototype to production across LLM infra and computer vision — a few highlights:

3x latency reduction on a multi-modal RAG platform over a 1M+ document knowledge base
80% GPU memory savings with LoRA/PEFT adapters for cost-efficient production fine-tuning
0.97 mAP on car dent detection & segmentation for an insurance client
90% MOTA on sports analytics pipelines processing millions of frames/week
10x SDK download growth via Clarifai Python SDK & CLI contributions
NVIDIA Smart City Hackathon finalist (Asia-Pacific) — pothole detection with RT-DETR

🚀 Featured Projects

🗂️ Docwhisper

Ask questions directly against your documents — powered by local LLMs, no cloud required.

A document Q&A system built on a full RAG pipeline: ingest PDFs, chunk and embed them into a vector store, then retrieve and answer with a locally-running LLM via Ollama. Fast, private, and runs entirely on your machine. Includes MLflow-based observability — query traces, retrieval quality metrics, and latency are tracked per run for easy debugging and iteration.

📊 local-slm-bench

Benchmark and compare small language models side-by-side — entirely offline, zero cloud.

Run up to 3 models simultaneously with real-time token streaming and per-model metrics. A built-in benchmark suite covers 18 prompts across 10 categories (reasoning, code, math, safety). Interactive Plotly charts visualize throughput, TTFT, RAM usage, and quality-vs-speed trade-offs — all on local hardware.

🤝 Clarifai Python SDK (Contributor)

The official Python client for the Clarifai AI platform — models, workflows, datasets, and deployments via a clean API.

Drove significant growth through CLI improvements, new API surface coverage, and DX enhancements — contributing to a 10x increase in monthly downloads.

🤖 multi-agent-research

Decompose complex queries into subtasks and coordinate specialized agents to research topics end-to-end.

Production-grade multi-agent pipeline: Orchestrator → Search Agent → Summarizer → Critic, connected via A2A typed messaging and MCP servers for tools, memory, and web search. Tracks token usage, latency, cost, and confidence scores per agent. Streamlit dashboard with Plotly charts for visual analytics.

Other Projects

Project	Description	Stack
PyTorch Object Detect & Track	Real-time multi-object detection and tracking in video	Python, PyTorch, YOLO

🛠️ Tech Stack

LLM & GenAI

Models I've Worked With

Computer Vision

MLOps & Infrastructure

Data & Storage

📊 GitHub Stats

💬 Let's Build Something

Open to collaborations on LLM infrastructure, computer vision systems, and AI platform engineering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly