fix(embedding,api): unblock web UI by fixing pipeline() hang and read-path warmup by StumHuang · Pull Request #100 · tickernelz/opencode-mem

StumHuang · 2026-04-23T01:22:15Z

Summary

Two bugs together caused the web UI at http://127.0.0.1:4747/ to render blank
and /api/search to time out:

pipeline("feature-extraction", ...) hung indefinitely on the first call,
poisoning initPromise so every subsequent embed() blocked.
Read-only handlers (stats, tags, list) awaited embedding warmup
even though they only read SQLite rows — so the hang above propagated to
every read endpoint.

Root causes

ONNX WASM threading: @huggingface/transformers v4 defaults
wasm.numThreads > 1, but Node/Bun lack SharedArrayBuffer, deadlocking
onnxruntime-web.
Fix: env.backends.onnx.wasm.numThreads = 1.
Ref: wasm does not work on node right now with multiple threads huggingface/transformers.js#488
dtype default: v4 tries to load model.onnx (fp32) when dtype is
not specified. The shipped cache only has model_quantized.onnx, so init
falls back to a network fetch from huggingface.co that fails in restricted
networks.
Fix: pass dtype: "q8" to pipeline().
Read API coupling: handleStats / handleListTags /
handleListMemories called embeddingService.warmup() despite never
using the embedding model.
Fix: drop warmup() from those three handlers; handleSearch keeps it.

Verification

Pipeline ready in ~2.3s after fixes (was infinite hang).
/api/search?q=hello returns real results (similarity 0.457, vecLen=768).
/api/stats, /api/tags, /api/memories respond immediately on cold start.
bun run typecheck and Prettier (via lint-staged) pass on both commits.

Commits

fix(embedding): prevent pipeline() hang in Node/Bun runtime
fix(api): remove embedding warmup from read-only handlers

Two independent issues caused pipeline("feature-extraction", ...) to hang indefinitely (35s+) on first call, poisoning initPromise so every subsequent embed() blocked forever. Symptom: web UI blank, /api/search returned "Empty reply from server". 1. ONNX WASM threading deadlock @huggingface/transformers v4 defaults wasm.numThreads > 1, but Node.js and Bun lack SharedArrayBuffer support, so onnxruntime-web deadlocks during pipeline init. Fixed by forcing numThreads=1 in ensureTransformersLoaded(). Ref huggingface/transformers.js#488. 2. dtype default mismatch transformers v4 default dtype tries to load model.onnx (fp32, ~500MB). The cached model directory only ships model_quantized.onnx, so pipeline falls back to a network fetch from huggingface.co. In restricted networks this fails with "Unable to connect". Fixed by passing dtype: "q8" to the pipeline() options so the local quantized model is used unconditionally. After both fixes, pipeline ready in ~2.3s and /api/search returns real results (similarity 0.457, vecLen=768).

handleListTags, handleListMemories, and handleStats each awaited embeddingService.warmup() before serving. These handlers only read SQLite/sqlite-vec rows and never compute query embeddings, so the coupling was unnecessary. When warmup() stalled (or simply took a few seconds on cold start), the entire web UI went blank because every read endpoint blocked behind the embedding model load. Removed the warmup() calls from the three read paths. handleSearch still warms up because it needs the query vector. Net effect: /api/stats, /api/tags, and /api/memories now respond immediately even when the embedding model has not been loaded yet, so the dashboard can render before the first search query.

Copilot

Pull request overview

This PR fixes a startup deadlock in the embedding initialization path that caused the web UI to render blank and /api/search to time out, and it decouples read-only API endpoints from embedding warmup so they remain responsive on cold start.

Changes:

Force onnxruntime-web WASM to single-thread mode to prevent pipeline() init hangs in Node/Bun environments lacking SharedArrayBuffer.
Force quantized ONNX model loading via dtype: "q8" to avoid unwanted remote fp32 model downloads.
Remove embedding warmup calls from read-only handlers (stats, tags, list), keeping warmup only where vector similarity is actually needed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/services/embedding.ts`	Applies runtime safeguards to prevent `pipeline()` initialization hangs and ensures a quantized model is used.
`src/services/api-handlers.ts`	Removes unnecessary embedding warmup from read-only handlers to prevent request blocking on embedding init.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T01:24:57Z

+    (mod.env as any).backends.onnx.wasm.numThreads = 1;
+  } catch (e) {
+    log("Failed to set wasm.numThreads", { error: String(e) });


The numThreads override is described as critical to avoid an indefinite hang, but failures are currently swallowed (only logged) and init continues. If this assignment fails at runtime, pipeline() can still hang and the root issue returns; consider failing fast (throw) or explicitly validating/creating the nested env.backends.onnx.wasm object and asserting the final value before proceeding.

Suggested change

(mod.env as any).backends.onnx.wasm.numThreads = 1;

} catch (e) {

log("Failed to set wasm.numThreads", { error: String(e) });

const envWithBackends = mod.env as any;

envWithBackends.backends ??= {};

envWithBackends.backends.onnx ??= {};

envWithBackends.backends.onnx.wasm ??= {};

envWithBackends.backends.onnx.wasm.numThreads = 1;

if (envWithBackends.backends.onnx.wasm.numThreads !== 1) {

throw new Error("Failed to verify wasm.numThreads override");

}

} catch (e) {

log("Failed to set wasm.numThreads", { error: String(e) });

throw new Error(`Failed to initialize transformers WASM threading configuration: ${String(e)}`);

Copilot · 2026-04-23T01:24:58Z

      const { pipeline } = await ensureTransformersLoaded();
      this.pipe = await pipeline("feature-extraction", CONFIG.embeddingModel, {
        progress_callback: progressCallback,
-      });
+        // Force quantized ONNX. Default is fp32 model.onnx which transformers v4
+        // tries to download from huggingface.co; cache only ships model_quantized.onnx
+        // and HF is unreachable behind GFW, causing init to fail.
+        dtype: "q8",
+      } as any);


The pipeline options are cast to any, which removes compile-time verification for the newly added dtype knob. Since this option is required to prevent an unwanted remote download, it would be safer to use the proper pipeline options type (or satisfies a known options interface) so typos/unsupported keys are caught by typecheck.

Per Copilot review on PR tickernelz#100: the 'as any' cast on pipeline() options silently dropped compile-time validation of the dtype key, which is the exact protection that prevents an unwanted fp32 model.onnx download. Use the official PretrainedModelOptions type so any future typo in dtype or other option keys fails at tsc time instead of at runtime.

StumHuang added 2 commits April 23, 2026 09:15

Copilot AI review requested due to automatic review settings April 23, 2026 01:22

Copilot started reviewing on behalf of StumHuang April 23, 2026 01:22 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

StumHuang mentioned this pull request Apr 23, 2026

refactor(ai): use opencode v2 SDK session.prompt for structured output #101

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embedding,api): unblock web UI by fixing pipeline() hang and read-path warmup#100

fix(embedding,api): unblock web UI by fixing pipeline() hang and read-path warmup#100
StumHuang wants to merge 3 commits intotickernelz:mainfrom
StumHuang:feat/github-copilot-sdk

StumHuang commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    (mod.env as any).backends.onnx.wasm.numThreads = 1;
-  } catch (e) {
-    log("Failed to set wasm.numThreads", { error: String(e) });
+    const envWithBackends = mod.env as any;
+    envWithBackends.backends ??= {};
+    envWithBackends.backends.onnx ??= {};
+    envWithBackends.backends.onnx.wasm ??= {};
+    envWithBackends.backends.onnx.wasm.numThreads = 1;
+    if (envWithBackends.backends.onnx.wasm.numThreads !== 1) {
+      throw new Error("Failed to verify wasm.numThreads override");
+    }
+  } catch (e) {
+    log("Failed to set wasm.numThreads", { error: String(e) });
+    throw new Error(`Failed to initialize transformers WASM threading configuration: ${String(e)}`);

Conversation

StumHuang commented Apr 23, 2026

Summary

Root causes

Verification

Commits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants