feat: move KB embedding to Web Worker with persistent progress#187
feat: move KB embedding to Web Worker with persistent progress#187
Conversation
The knowledge base indexing was freezing the browser because Transformers.js WASM inference ran on the main thread. This moves the heavy embedding work to a dedicated Web Worker while keeping the Voy index build on the main thread (its WASM glue requires `document`). - Add indexing-worker.js: loads Transformers.js from CDN, embeds chunks in batches, posts progress + results back via postMessage - Add vectorStore.buildFromEmbeddings() to accept pre-computed embeddings from the worker - Lift indexing state into knowledge-base.js singleton with pub/sub so progress survives Settings tab unmount/remount - Deduplicate builds: calling buildIndex() while running returns the existing promise Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pluginslab
left a comment
There was a problem hiding this comment.
Great work on this PR! The architecture is well thought out, offloading WASM inference to a Web Worker while keeping Voy on the main thread is the right approach, and the singleton pub/sub pattern for surviving tab switches is clean.
A few things to address below (inline comments).
| } else if ( msg.type === 'error' ) { | ||
| log.error( 'Worker error:', msg.message ); | ||
| worker.terminate(); | ||
| reject( new Error( msg.message ) ); |
There was a problem hiding this comment.
Worker cleanup on cancellation
If the user clears the index mid-build or navigates away, the worker keeps running. Consider storing a reference to the active worker so clearIndex() (and potentially a page unload handler) can call worker.terminate().
|
|
||
| worker.addEventListener( 'error', ( e ) => { | ||
| log.error( 'Worker crashed:', e.message ); | ||
| worker.terminate(); |
There was a problem hiding this comment.
Missing messageerror handler
The worker listens for error but not messageerror (fired when a message can't be deserialized). Worth adding alongside the existing error listener for completeness.
|
|
||
| /** | ||
| * Build the Voy index from pre-computed embeddings and persist. | ||
| * Used after the indexing worker returns embeddings from a background thread. |
There was a problem hiding this comment.
Unused reload() export
reload() is exported but never called, buildFromEmbeddings() is used instead. Is this intended for future use, or should it be removed?
| */ | ||
|
|
||
| /* eslint-disable no-console */ | ||
|
|
There was a problem hiding this comment.
CDN dependency
Transformers.js is loaded from cdn.jsdelivr.net at runtime. If the CDN is unreachable the user gets a generic error. Would be good to document this dependency and catch the dynamic import with a more descriptive message (e.g., "Could not load embedding model, check your internet connection").
|
|
||
| /* ── Worker helper ───────────────────────────────────────────────────── */ | ||
|
|
||
| /** |
There was a problem hiding this comment.
Subtle but correct error flow
In buildIndex(), _buildPromise is nulled in finally while _building is reset inside _runBuild()'s catch/success paths. This works correctly because _runBuild always resets _building before the promise settles, just noting it's a subtle contract worth a brief inline comment so future readers don't have to trace through both functions.
| const [ kbProgress, setKbProgress ] = useState( kbGetProgress ); | ||
| const [ kbError, setKbError ] = useState( kbGetError ); | ||
|
|
||
| useEffect( () => { |
There was a problem hiding this comment.
Double call to kbIsBuilding()
kbIsBuilding() is called twice per subscribe tick, once for setKbBuilding and once for the conditional. Small suggestion:
const building = kbIsBuilding();
setKbBuilding( building );
setKbProgress( kbGetProgress() );
setKbError( kbGetError() );
if ( ! building ) {
setKbStatus( getKBStatus() );
}| * Subscribe to state changes. Returns an unsubscribe function. | ||
| * | ||
| * @param {Function} listener Called on every state change. | ||
| * @return {Function} Unsubscribe. |
There was a problem hiding this comment.
Missing @since tags
The new subscribe, isBuilding, getProgress, getError exports should have @since tags to match the worker's @since 0.14.0.
Summary
indexing-worker.js, keeping the UI responsive during knowledge base buildsbuildIndex()while already running returns the existing promise instead of starting a duplicateArchitecture
Voy WASM glue requires
document(unavailable in workers), so only the embedding inference runs in the worker. The main thread receives pre-computed embeddings and builds the Voy search index.Files changed
src/extensions/services/indexing-worker.jssrc/extensions/services/knowledge-base.jssrc/extensions/services/vector-store.jsbuildFromEmbeddings()+reload()methodssrc/extensions/components/SettingsTab.jsxwebpack.config.jsindexing-workerentry pointTest plan
npm testpasses (74/74)npm run buildsucceedsGenerated with Claude Code