Skip to content

fix(embedding,api): unblock web UI by fixing pipeline() hang and read-path warmup#100

Open
StumHuang wants to merge 3 commits intotickernelz:mainfrom
StumHuang:feat/github-copilot-sdk
Open

fix(embedding,api): unblock web UI by fixing pipeline() hang and read-path warmup#100
StumHuang wants to merge 3 commits intotickernelz:mainfrom
StumHuang:feat/github-copilot-sdk

Conversation

@StumHuang
Copy link
Copy Markdown

Summary

Two bugs together caused the web UI at http://127.0.0.1:4747/ to render blank
and /api/search to time out:

  1. pipeline("feature-extraction", ...) hung indefinitely on the first call,
    poisoning initPromise so every subsequent embed() blocked.
  2. Read-only handlers (stats, tags, list) awaited embedding warmup
    even though they only read SQLite rows — so the hang above propagated to
    every read endpoint.

Root causes

  • ONNX WASM threading: @huggingface/transformers v4 defaults
    wasm.numThreads > 1, but Node/Bun lack SharedArrayBuffer, deadlocking
    onnxruntime-web.
    Fix: env.backends.onnx.wasm.numThreads = 1.
    Ref: wasm does not work on node right now with multiple threads huggingface/transformers.js#488
  • dtype default: v4 tries to load model.onnx (fp32) when dtype is
    not specified. The shipped cache only has model_quantized.onnx, so init
    falls back to a network fetch from huggingface.co that fails in restricted
    networks.
    Fix: pass dtype: "q8" to pipeline().
  • Read API coupling: handleStats / handleListTags /
    handleListMemories called embeddingService.warmup() despite never
    using the embedding model.
    Fix: drop warmup() from those three handlers; handleSearch keeps it.

Verification

  • Pipeline ready in ~2.3s after fixes (was infinite hang).
  • /api/search?q=hello returns real results (similarity 0.457, vecLen=768).
  • /api/stats, /api/tags, /api/memories respond immediately on cold start.
  • bun run typecheck and Prettier (via lint-staged) pass on both commits.

Commits

  • fix(embedding): prevent pipeline() hang in Node/Bun runtime
  • fix(api): remove embedding warmup from read-only handlers

Two independent issues caused pipeline("feature-extraction", ...) to hang
indefinitely (35s+) on first call, poisoning initPromise so every subsequent
embed() blocked forever. Symptom: web UI blank, /api/search returned
"Empty reply from server".

1. ONNX WASM threading deadlock
   @huggingface/transformers v4 defaults wasm.numThreads > 1, but Node.js
   and Bun lack SharedArrayBuffer support, so onnxruntime-web deadlocks
   during pipeline init. Fixed by forcing numThreads=1 in
   ensureTransformersLoaded(). Ref huggingface/transformers.js#488.

2. dtype default mismatch
   transformers v4 default dtype tries to load model.onnx (fp32, ~500MB).
   The cached model directory only ships model_quantized.onnx, so pipeline
   falls back to a network fetch from huggingface.co. In restricted
   networks this fails with "Unable to connect". Fixed by passing
   dtype: "q8" to the pipeline() options so the local quantized model is
   used unconditionally.

After both fixes, pipeline ready in ~2.3s and /api/search returns real
results (similarity 0.457, vecLen=768).
handleListTags, handleListMemories, and handleStats each awaited
embeddingService.warmup() before serving. These handlers only read
SQLite/sqlite-vec rows and never compute query embeddings, so the
coupling was unnecessary. When warmup() stalled (or simply took a few
seconds on cold start), the entire web UI went blank because every
read endpoint blocked behind the embedding model load.

Removed the warmup() calls from the three read paths. handleSearch
still warms up because it needs the query vector. Net effect:
/api/stats, /api/tags, and /api/memories now respond immediately even
when the embedding model has not been loaded yet, so the dashboard
can render before the first search query.
Copilot AI review requested due to automatic review settings April 23, 2026 01:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a startup deadlock in the embedding initialization path that caused the web UI to render blank and /api/search to time out, and it decouples read-only API endpoints from embedding warmup so they remain responsive on cold start.

Changes:

  • Force onnxruntime-web WASM to single-thread mode to prevent pipeline() init hangs in Node/Bun environments lacking SharedArrayBuffer.
  • Force quantized ONNX model loading via dtype: "q8" to avoid unwanted remote fp32 model downloads.
  • Remove embedding warmup calls from read-only handlers (stats, tags, list), keeping warmup only where vector similarity is actually needed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/services/embedding.ts Applies runtime safeguards to prevent pipeline() initialization hangs and ensures a quantized model is used.
src/services/api-handlers.ts Removes unnecessary embedding warmup from read-only handlers to prevent request blocking on embedding init.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/services/embedding.ts
Comment on lines +24 to +26
(mod.env as any).backends.onnx.wasm.numThreads = 1;
} catch (e) {
log("Failed to set wasm.numThreads", { error: String(e) });
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numThreads override is described as critical to avoid an indefinite hang, but failures are currently swallowed (only logged) and init continues. If this assignment fails at runtime, pipeline() can still hang and the root issue returns; consider failing fast (throw) or explicitly validating/creating the nested env.backends.onnx.wasm object and asserting the final value before proceeding.

Suggested change
(mod.env as any).backends.onnx.wasm.numThreads = 1;
} catch (e) {
log("Failed to set wasm.numThreads", { error: String(e) });
const envWithBackends = mod.env as any;
envWithBackends.backends ??= {};
envWithBackends.backends.onnx ??= {};
envWithBackends.backends.onnx.wasm ??= {};
envWithBackends.backends.onnx.wasm.numThreads = 1;
if (envWithBackends.backends.onnx.wasm.numThreads !== 1) {
throw new Error("Failed to verify wasm.numThreads override");
}
} catch (e) {
log("Failed to set wasm.numThreads", { error: String(e) });
throw new Error(`Failed to initialize transformers WASM threading configuration: ${String(e)}`);

Copilot uses AI. Check for mistakes.
Comment thread src/services/embedding.ts Outdated
Comment on lines +66 to +73
const { pipeline } = await ensureTransformersLoaded();
this.pipe = await pipeline("feature-extraction", CONFIG.embeddingModel, {
progress_callback: progressCallback,
});
// Force quantized ONNX. Default is fp32 model.onnx which transformers v4
// tries to download from huggingface.co; cache only ships model_quantized.onnx
// and HF is unreachable behind GFW, causing init to fail.
dtype: "q8",
} as any);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline options are cast to any, which removes compile-time verification for the newly added dtype knob. Since this option is required to prevent an unwanted remote download, it would be safer to use the proper pipeline options type (or satisfies a known options interface) so typos/unsupported keys are caught by typecheck.

Copilot uses AI. Check for mistakes.
Per Copilot review on PR tickernelz#100: the 'as any' cast on pipeline() options
silently dropped compile-time validation of the dtype key, which is the
exact protection that prevents an unwanted fp32 model.onnx download.

Use the official PretrainedModelOptions type so any future typo in dtype
or other option keys fails at tsc time instead of at runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants