Skip to content

feat: move KB embedding to Web Worker with persistent progress#187

Open
ivdimova wants to merge 1 commit intodevfrom
feature/background-indexing-worker
Open

feat: move KB embedding to Web Worker with persistent progress#187
ivdimova wants to merge 1 commit intodevfrom
feature/background-indexing-worker

Conversation

@ivdimova
Copy link
Copy Markdown
Collaborator

Summary

  • Web Worker for embedding: Moves Transformers.js WASM inference to a dedicated indexing-worker.js, keeping the UI responsive during knowledge base builds
  • Singleton state with pub/sub: Indexing progress persists across Settings tab navigations — leave the tab and come back to see live progress
  • Build deduplication: Calling buildIndex() while already running returns the existing promise instead of starting a duplicate

Architecture

Main thread                          Web Worker (indexing-worker.js)
                                     
REST API extraction (phases 1-4)
         |
         |-- postMessage(chunks) --> Load Transformers.js from CDN
         |                           Embed all chunks (WASM, batched)
         |  <-- progress updates  --|
         |  <-- embeddings + meta --+
         |
Build Voy index (fast, <1s)
Persist to IndexedDB

Voy WASM glue requires document (unavailable in workers), so only the embedding inference runs in the worker. The main thread receives pre-computed embeddings and builds the Voy search index.

Files changed

File Change
src/extensions/services/indexing-worker.js NEW — Web Worker for Transformers.js embedding
src/extensions/services/knowledge-base.js Singleton state, pub/sub, worker orchestration
src/extensions/services/vector-store.js buildFromEmbeddings() + reload() methods
src/extensions/components/SettingsTab.jsx Subscribe to singleton instead of local state
webpack.config.js indexing-worker entry point

Test plan

  • Build Index from Settings — verify UI stays responsive, progress bar updates
  • Switch to Chat tab during indexing, switch back — progress bar resumes
  • After indexing completes, verify search works (ask a question with doc search enabled)
  • Click Build Index twice quickly — should not start duplicate builds
  • npm test passes (74/74)
  • npm run build succeeds

Generated with Claude Code

The knowledge base indexing was freezing the browser because
Transformers.js WASM inference ran on the main thread. This moves
the heavy embedding work to a dedicated Web Worker while keeping
the Voy index build on the main thread (its WASM glue requires
`document`).

- Add indexing-worker.js: loads Transformers.js from CDN, embeds
  chunks in batches, posts progress + results back via postMessage
- Add vectorStore.buildFromEmbeddings() to accept pre-computed
  embeddings from the worker
- Lift indexing state into knowledge-base.js singleton with pub/sub
  so progress survives Settings tab unmount/remount
- Deduplicate builds: calling buildIndex() while running returns
  the existing promise

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@pluginslab pluginslab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this PR! The architecture is well thought out, offloading WASM inference to a Web Worker while keeping Voy on the main thread is the right approach, and the singleton pub/sub pattern for surviving tab switches is clean.

A few things to address below (inline comments).

} else if ( msg.type === 'error' ) {
log.error( 'Worker error:', msg.message );
worker.terminate();
reject( new Error( msg.message ) );
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker cleanup on cancellation

If the user clears the index mid-build or navigates away, the worker keeps running. Consider storing a reference to the active worker so clearIndex() (and potentially a page unload handler) can call worker.terminate().


worker.addEventListener( 'error', ( e ) => {
log.error( 'Worker crashed:', e.message );
worker.terminate();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing messageerror handler

The worker listens for error but not messageerror (fired when a message can't be deserialized). Worth adding alongside the existing error listener for completeness.


/**
* Build the Voy index from pre-computed embeddings and persist.
* Used after the indexing worker returns embeddings from a background thread.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused reload() export

reload() is exported but never called, buildFromEmbeddings() is used instead. Is this intended for future use, or should it be removed?

*/

/* eslint-disable no-console */

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDN dependency

Transformers.js is loaded from cdn.jsdelivr.net at runtime. If the CDN is unreachable the user gets a generic error. Would be good to document this dependency and catch the dynamic import with a more descriptive message (e.g., "Could not load embedding model, check your internet connection").


/* ── Worker helper ───────────────────────────────────────────────────── */

/**
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle but correct error flow

In buildIndex(), _buildPromise is nulled in finally while _building is reset inside _runBuild()'s catch/success paths. This works correctly because _runBuild always resets _building before the promise settles, just noting it's a subtle contract worth a brief inline comment so future readers don't have to trace through both functions.

const [ kbProgress, setKbProgress ] = useState( kbGetProgress );
const [ kbError, setKbError ] = useState( kbGetError );

useEffect( () => {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double call to kbIsBuilding()

kbIsBuilding() is called twice per subscribe tick, once for setKbBuilding and once for the conditional. Small suggestion:

const building = kbIsBuilding();
setKbBuilding( building );
setKbProgress( kbGetProgress() );
setKbError( kbGetError() );
if ( ! building ) {
    setKbStatus( getKBStatus() );
}

* Subscribe to state changes. Returns an unsubscribe function.
*
* @param {Function} listener Called on every state change.
* @return {Function} Unsubscribe.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing @since tags

The new subscribe, isBuilding, getProgress, getError exports should have @since tags to match the worker's @since 0.14.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants