feat: Implement vector search and vectorization features for course t…#48
feat: Implement vector search and vectorization features for course t…#48webdevcody merged 5 commits intomainfrom
Conversation
…ranscripts - Updated Docker configuration to use pgvector for PostgreSQL. - Added new endpoints and UI components for vector search and vectorization in the admin panel. - Implemented chunking and embedding generation for video transcripts. - Created database schema for storing transcript chunks and embeddings. - Developed use cases for vectorizing segments and searching transcripts using embeddings. - Introduced admin-only access for vector search functionality. - Enhanced pricing configuration and updated related components for consistency.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds pgvector-backed transcript chunk storage and migrations, embedding generation and chunking utilities, data-access and use-case orchestration for vectorization/search, admin routes and UIs with server functions, env and dependency additions, and minor pricing config updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Admin as Admin UI
participant SF as Server Fn
participant UC as Use Case
participant DA as Data Access
participant OpenAI as OpenAI API
participant DB as PostgreSQL (pgvector)
Admin->>SF: POST /vectorize { segmentId }
SF->>UC: vectorizeSegmentUseCase(segmentId)
UC->>DA: fetch segment + transcript
DA->>DB: SELECT segment + transcript
DB-->>DA: segment record
DA-->>UC: segment + transcript
alt transcript exists
UC->>UC: chunkTranscript()
UC->>OpenAI: generateEmbeddings(batch of chunk texts)
OpenAI-->>UC: embeddings[]
UC->>DA: deleteChunksBySegmentId(segmentId)
DA->>DB: DELETE FROM transcript_chunk WHERE segmentId=...
UC->>DA: createTranscriptChunks(chunks with embeddings)
DA->>DB: INSERT ... embedding vector
DB-->>DA: insert result
DA-->>UC: created count
UC-->>SF: success { chunkCount }
else no transcript
UC-->>SF: error
end
SF-->>Admin: response
sequenceDiagram
autonumber
participant Admin as Admin UI
participant SF as Server Fn
participant UC as Use Case
participant OpenAI as OpenAI API
participant DA as Data Access
participant DB as PostgreSQL (pgvector)
Admin->>SF: GET /search?q=...
SF->>UC: searchTranscriptsUseCase(query, limit)
UC->>OpenAI: generateEmbedding(query)
OpenAI-->>UC: embedding
UC->>DA: searchByEmbedding(embedding, limit)
DA->>DB: SELECT ... ORDER BY embedding <-> query_embedding LIMIT N
rect rgb(220,240,220)
note over DB: pgvector cosine distance search
end
DB-->>DA: ranked rows with segment/module
DA-->>UC: SearchResult[]
UC-->>SF: results
SF-->>Admin: results (snippets + scores)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (6)
src/routes/learn/-components/add-segment/server-functions.ts (1)
93-103: Consider documenting the intentional validation duplication.The slug validation logic is duplicated between
validateSlugFn(lines 97-100) andcreateSegmentFn(lines 30-34). While this duplication serves a valid purpose—validateSlugFnprovides early feedback before expensive video uploads, andcreateSegmentFnguards against race conditions—it would be helpful to add comments explaining why both checks are necessary.🔎 Suggested documentation
export const validateSlugFn = createServerFn() .middleware([adminMiddleware]) .inputValidator(z.object({ slug: z.string() })) .handler(async ({ data }) => { + // Early validation to provide fast feedback before expensive video upload. + // Note: createSegmentFn re-validates to prevent race conditions. if (await isSlugInUse(data.slug)) { throw new Error( `The slug "${data.slug}" is already in use. Please choose a different slug.` ); } return { valid: true }; });src/lib/openai.ts (1)
10-16: Consider adding input validation.Adding validation for empty strings and excessively long text could prevent unnecessary API calls and provide clearer error messages.
🔎 Suggested input validation
export async function generateEmbedding(text: string): Promise<number[]> { + if (!text || text.trim().length === 0) { + throw new Error('Text input cannot be empty'); + } + try { const response = await openai.embeddings.create({ model: EMBEDDING_MODEL, input: text, }); return response.data[0].embedding; } catch (error) { console.error('Failed to generate embedding:', error); throw new Error( `Embedding generation failed: ${error instanceof Error ? error.message : 'Unknown error'}` ); } }src/lib/chunking.ts (1)
12-57: Consider adding error handling for tokenization.The
encodeanddecodefunctions fromgpt-tokenizercould potentially fail on malformed input. Adding try-catch would make the function more robust.🔎 Suggested error handling
export function chunkTranscript(transcript: string): Chunk[] { if (!transcript || transcript.trim().length === 0) { return []; } + try { const tokens = encode(transcript); const chunks: Chunk[] = []; // ... rest of the logic ... return chunks; + } catch (error) { + console.error('Failed to chunk transcript:', error); + throw new Error( + `Transcript chunking failed: ${error instanceof Error ? error.message : 'Unknown error'}` + ); + } }src/routes/admin/vectorization.tsx (1)
137-145: Consider optimizing module lookup with a Map.The current implementation uses
.find()for each segment to locate its module by title, resulting in O(n×m) complexity. For larger datasets, consider building aMap<string, number>for O(1) lookups.🔎 Proposed optimization
+ const moduleTitleToId = new Map(modules.map((m) => [m.title, m.id])); + segments.forEach((segment) => { - const moduleId = modules.find((m) => m.title === segment.moduleTitle)?.id; + const moduleId = moduleTitleToId.get(segment.moduleTitle); if (moduleId) { const moduleData = segmentsByModule.get(moduleId); if (moduleData) { moduleData.segments.push(segment); } } });src/use-cases/vector-search.ts (1)
47-61: Consider defensive check for embeddings array length.The code assumes
generateEmbeddings(texts)returns an array with the same length astexts. If the OpenAI API returns fewer embeddings (e.g., due to filtering or errors), accessingembeddings[j]could yieldundefined, resulting in chunks stored with invalid embeddings.🔎 Proposed defensive check
for (let i = 0; i < chunks.length; i += EMBEDDING_BATCH_SIZE) { const batch = chunks.slice(i, i + EMBEDDING_BATCH_SIZE); const texts = batch.map((c) => c.text); const embeddings = await generateEmbeddings(texts); + if (embeddings.length !== batch.length) { + throw new Error( + `Embedding count mismatch: expected ${batch.length}, got ${embeddings.length}` + ); + } + for (let j = 0; j < batch.length; j++) { allChunksWithEmbeddings.push({src/data-access/transcript-chunks.ts (1)
70-86: Hardcoded table names may cause maintenance issues.The raw SQL uses hardcoded table names (
app_transcript_chunk,app_segment,app_module) which are derived from thePREFIXconstant in schema.ts. If the prefix changes, this query will break silently.🔎 Consider using template literals with table references
You could extract table name references or use Drizzle's
getTableName()utility to keep names in sync:import { getTableName } from "drizzle-orm"; // Then in the query: const tcTable = getTableName(transcriptChunks); const sTable = getTableName(segments); const mTable = getTableName(modules);Alternatively, document the dependency on the prefix or add a test that verifies the table names match.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (21)
docker-compose.ymldrizzle/0050_glossy_infant_terrible.sqldrizzle/meta/0050_snapshot.jsondrizzle/meta/_journal.jsonpackage.jsonsrc/config.tssrc/data-access/transcript-chunks.tssrc/db/schema.tssrc/fn/vector-search.tssrc/lib/chunking.tssrc/lib/openai.tssrc/routeTree.gen.tssrc/routes/-components/pricing.tsxsrc/routes/admin/-components/admin-nav.tsxsrc/routes/admin/vector-search.tsxsrc/routes/admin/vectorization.tsxsrc/routes/learn/-components/add-segment/server-functions.tssrc/routes/learn/-components/add-segment/use-add-segment.tssrc/use-cases/vector-search.tssrc/utils/env.tsvector-search-prompt.md
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: When invoking a TanStack server function, pass an object with adataproperty containing parameters
All card components should use the shadcnCardcomponent withCardTitle,CardDescription, and other shadcn Card subcomponents
All server-side operations should use TanStack Start server functions created withcreateServerFninstead of direct backend logic
Form handling should use React Hook Form with Zod validation
**/*.{ts,tsx}: Never hard code magic numbers into code; consolidate them to the top of the file or in/src/config/index.tsto keep the code clean
Never allow a file to exceed 1,000 lines of code; split overly large files into smaller modular components to maintain code maintainability
Files:
src/utils/env.tssrc/routes/learn/-components/add-segment/server-functions.tssrc/routes/admin/-components/admin-nav.tsxsrc/fn/vector-search.tssrc/lib/openai.tssrc/routes/admin/vectorization.tsxsrc/use-cases/vector-search.tssrc/data-access/transcript-chunks.tssrc/routes/admin/vector-search.tsxsrc/config.tssrc/routes/-components/pricing.tsxsrc/routeTree.gen.tssrc/routes/learn/-components/add-segment/use-add-segment.tssrc/lib/chunking.tssrc/db/schema.ts
**/{components,routes}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Component styling should follow shadcn/ui patterns with Tailwind CSS v4
Files:
src/routes/learn/-components/add-segment/server-functions.tssrc/routes/admin/-components/admin-nav.tsxsrc/routes/admin/vectorization.tsxsrc/routes/admin/vector-search.tsxsrc/routes/-components/pricing.tsxsrc/routes/learn/-components/add-segment/use-add-segment.ts
**/routes/admin/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
When making an admin page, protect it using
assertIsAdminFn()in thebeforeLoadhook withcreateFileRoute
Files:
src/routes/admin/-components/admin-nav.tsxsrc/routes/admin/vectorization.tsxsrc/routes/admin/vector-search.tsx
**/routes/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
**/routes/**/*.tsx: Pages should use thePagecomponent andPageHeadercomponent when possible
Implement error boundaries at the route level usingDefaultCatchBoundary
Files:
src/routes/admin/-components/admin-nav.tsxsrc/routes/admin/vectorization.tsxsrc/routes/admin/vector-search.tsxsrc/routes/-components/pricing.tsx
**/{fn,functions}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/{fn,functions}/**/*.{ts,tsx}: When making a TanStack server function that requires authentication, use.middleware([authenticatedMiddleware])
When making a TanStack server function that requires admin-only permission, use.middleware([adminMiddleware])
When making a TanStack server function with optional authentication, use.middleware([unauthenticatedMiddleware])
Files:
src/fn/vector-search.ts
src/fn/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/tanstack-server-functions.mdc)
src/fn/**/*.ts: When creating a tanstack start server function, always include a middleware function call. Existing middleware functions are available insrc/lib/auth.ts
When using a server function, always attach a validator to it
When using server functions, never import drizzle related objects inside the function handler. Server functions should always invoke use cases based on the layered architecture pattern. Referencedocs/technical/layered-architecture.mdfor layered architecture guidelines
Files:
src/fn/vector-search.ts
**/data-access/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use the Repository pattern (defined in
/src/data-access/) for database query functions with Drizzle ORM
Files:
src/data-access/transcript-chunks.ts
🧠 Learnings (11)
📚 Learning: 2025-12-22T03:59:58.018Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: .cursor/rules/tanstack-server-functions.mdc:0-0
Timestamp: 2025-12-22T03:59:58.018Z
Learning: Applies to src/fn/**/*.ts : When using a server function, always attach a validator to it
Applied to files:
src/routes/learn/-components/add-segment/server-functions.tssrc/fn/vector-search.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/{fn,functions}/**/*.{ts,tsx} : When making a TanStack server function that requires admin-only permission, use `.middleware([adminMiddleware])`
Applied to files:
src/routes/learn/-components/add-segment/server-functions.tssrc/fn/vector-search.ts
📚 Learning: 2025-12-22T03:59:58.018Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: .cursor/rules/tanstack-server-functions.mdc:0-0
Timestamp: 2025-12-22T03:59:58.018Z
Learning: Applies to src/fn/**/*.ts : When creating a tanstack start server function, always include a middleware function call. Existing middleware functions are available in `src/lib/auth.ts`
Applied to files:
src/routes/learn/-components/add-segment/server-functions.tssrc/fn/vector-search.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/*.{ts,tsx} : All server-side operations should use TanStack Start server functions created with `createServerFn` instead of direct backend logic
Applied to files:
src/fn/vector-search.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/{fn,functions}/**/*.{ts,tsx} : When making a TanStack server function with optional authentication, use `.middleware([unauthenticatedMiddleware])`
Applied to files:
src/fn/vector-search.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/{fn,functions}/**/*.{ts,tsx} : When making a TanStack server function that requires authentication, use `.middleware([authenticatedMiddleware])`
Applied to files:
src/fn/vector-search.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/routes/admin/**/*.tsx : When making an admin page, protect it using `assertIsAdminFn()` in the `beforeLoad` hook with `createFileRoute`
Applied to files:
src/routes/admin/vectorization.tsxsrc/routes/admin/vector-search.tsxsrc/routeTree.gen.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/routes/**/*.tsx : Pages should use the `Page` component and `PageHeader` component when possible
Applied to files:
src/routes/admin/vectorization.tsxsrc/routes/admin/vector-search.tsx
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Routes are auto-generated to `/src/routeTree.gen.ts` and should never be edited manually
Applied to files:
src/routes/admin/vector-search.tsxsrc/routeTree.gen.ts
📚 Learning: 2025-12-19T06:04:47.031Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: .cursor/rules/clean-code.mdc:0-0
Timestamp: 2025-12-19T06:04:47.031Z
Learning: Applies to **/*.{ts,tsx} : Never hard code magic numbers into code; consolidate them to the top of the file or in `/src/config/index.ts` to keep the code clean
Applied to files:
src/config.ts
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/*.{ts,tsx} : Form handling should use React Hook Form with Zod validation
Applied to files:
src/routes/learn/-components/add-segment/use-add-segment.ts
🧬 Code graph analysis (10)
src/routes/learn/-components/add-segment/server-functions.ts (1)
src/data-access/segments.ts (1)
isSlugInUse(43-51)
src/routes/admin/-components/admin-nav.tsx (1)
src/db/index.ts (1)
Database(31-31)
src/fn/vector-search.ts (1)
src/use-cases/vector-search.ts (4)
vectorizeSegmentUseCase(16-70)vectorizeAllSegmentsUseCase(72-98)searchTranscriptsUseCase(100-110)getVectorizationStatusUseCase(136-171)
src/lib/openai.ts (1)
src/utils/env.ts (1)
env(27-47)
src/routes/admin/vectorization.tsx (2)
src/fn/vector-search.ts (3)
getVectorizationStatusFn(31-35)vectorizeAllSegmentsFn(18-22)vectorizeSegmentFn(11-16)src/db/schema.ts (2)
modules(115-126)segments(128-153)
src/use-cases/vector-search.ts (5)
src/data-access/segments.ts (2)
getSegmentById(52-59)getSegments(6-8)src/data-access/transcript-chunks.ts (6)
deleteChunksBySegmentId(11-16)createTranscriptChunks(6-9)SearchResult(51-60)searchByEmbedding(62-89)getChunkCountBySegmentIds(26-39)getTotalChunkCount(41-49)src/lib/chunking.ts (1)
chunkTranscript(12-57)src/lib/openai.ts (2)
generateEmbeddings(18-27)generateEmbedding(10-16)src/use-cases/modules.ts (1)
getModulesWithSegmentsUseCase(37-45)
src/data-access/transcript-chunks.ts (2)
src/db/schema.ts (2)
TranscriptChunkCreate(1202-1202)transcriptChunks(176-197)src/db/index.ts (1)
database(41-41)
src/routes/admin/vector-search.tsx (5)
src/fn/vector-search.ts (1)
searchTranscriptsFn(24-29)src/routes/admin/-components/page.tsx (1)
Page(7-21)src/components/ui/button.tsx (1)
Button(59-59)src/components/ui/card.tsx (5)
Card(85-85)CardHeader(86-86)CardTitle(88-88)CardDescription(90-90)CardContent(91-91)src/components/ui/badge.tsx (1)
Badge(38-38)
src/routes/-components/pricing.tsx (1)
src/config.ts (1)
PRICING_CONFIG(4-18)
src/routes/learn/-components/add-segment/use-add-segment.ts (1)
src/routes/learn/-components/add-segment/server-functions.ts (1)
validateSlugFn(93-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Playwright Tests
🔇 Additional comments (26)
src/routes/learn/-components/add-segment/use-add-segment.ts (1)
25-27: Good UX improvement with early slug validation.Validating the slug before initiating the video upload is a smart optimization that prevents users from wasting time on large uploads only to discover a slug conflict. The server-side validation in
createSegmentFnstill guards against race conditions, making this a safe change.Optional: Enhance the comment to mention race condition handling
- // Validate slug before uploading video to avoid wasting time on large uploads + // Validate slug before uploading video to avoid wasting time on large uploads. + // createSegmentFn re-validates to handle race conditions. await validateSlugFn({ data: { slug: values.slug } });package.json (1)
78-78: The packages specified are already at their latest versions:gpt-tokenizer@3.4.0andopenai@6.15.0are the current releases with no known vulnerabilities. No action needed.Likely an incorrect or invalid review comment.
docker-compose.yml (1)
3-3: pgvector:pg17 is a stable, officially supported choice compatible with the current stack.The
pgvector/pgvector:pg17image (v0.8.1) is an official Docker tag. Drizzle-orm 0.44.3 explicitly supports the pgvector extension with PostgreSQL 17, and pg 8.16.3 is compatible. No changes needed. Note: pgvector does not publish alpine variants, sopg17-alpineis not available.Also applies to: 15-15
Likely an incorrect or invalid review comment.
src/routes/admin/-components/admin-nav.tsx (2)
19-20: LGTM! Icons imported correctly.The Database and Search icons are appropriate for the vectorization and vector search features.
60-71: AddassertIsAdminFn()protection to admin routes.The routes
/admin/vectorizationand/admin/vector-searchare missing the requiredbeforeLoadhook withassertIsAdminFn(). Update both route files to include:export const Route = createFileRoute("/admin/vectorization")({ beforeLoad: assertIsAdminFn(), component: AdminVectorization, });⛔ Skipped due to learnings
Learnt from: CR Repo: webdevcody/agentic-jumpstart PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-12-19T06:04:43.442Z Learning: Applies to **/routes/admin/**/*.tsx : When making an admin page, protect it using `assertIsAdminFn()` in the `beforeLoad` hook with `createFileRoute`Learnt from: CR Repo: webdevcody/agentic-jumpstart PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-12-19T06:04:43.442Z Learning: Applies to **/{fn,functions}/**/*.{ts,tsx} : When making a TanStack server function that requires admin-only permission, use `.middleware([adminMiddleware])`Learnt from: amoscicki Repo: webdevcody/agentic-jumpstart PR: 46 File: src/fn/dev/middleware.ts:3-8 Timestamp: 2025-12-26T04:51:53.750Z Learning: In this codebase, dev-mode protection uses different strategies by context: server functions (using DevGuardMiddleware) throw errors that are caught by error boundaries, while route-level protection (like assertDevModeFn in beforeLoad) redirects to homepage for better UX. This prevents users from seeing error pages when accidentally accessing dev routes in production.Learnt from: CR Repo: webdevcody/agentic-jumpstart PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-12-19T06:04:43.442Z Learning: Applies to **/routes/**/*.tsx : Pages should use the `Page` component and `PageHeader` component when possibleLearnt from: CR Repo: webdevcody/agentic-jumpstart PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-12-19T06:04:43.442Z Learning: Applies to **/routes/**/*.tsx : Implement error boundaries at the route level using `DefaultCatchBoundary`src/routes/-components/pricing.tsx (2)
9-9: LGTM! Centralized pricing configuration imported.This follows the coding guideline to consolidate magic numbers into configuration files.
71-86: Excellent refactoring to eliminate hardcoded pricing values.The implementation correctly uses:
FORMATTED_ORIGINAL_PRICEfor the strike-through displayDISCOUNT_PERCENTAGEfor the savings labelFORMATTED_CURRENT_PRICEfor the main priceThis improves maintainability by centralizing all pricing logic in
~/config.ts, and the discount percentage will automatically update when prices change.src/config.ts (1)
5-6: Note the significant pricing and discount changes.The price update reflects:
- Current price increased from $199 to $249 (+25%)
- Original price increased from $297 to $299
- Discount reduced from 33% to 17%
This is a major pricing change that will be reflected across all pricing displays. Ensure this aligns with business objectives and that stakeholders are aware of the reduced discount percentage.
drizzle/0050_glossy_infant_terrible.sql (3)
1-2: LGTM! pgvector extension enabled correctly.The pgvector extension is properly enabled with
IF NOT EXISTSto support vector similarity search.
9-9: LGTM! Correct embedding dimension for OpenAI.The
vector(1536)dimension matches OpenAI's text-embedding-ada-002 and text-embedding-3-small models. The nullable constraint allows for asynchronous embedding generation.
15-17: LGTM! Appropriate constraints and indexes.The foreign key with
ON DELETE cascadeensures data integrity, and the indexes on(segmentId)and(segmentId, chunkIndex)will optimize common query patterns for segment-based lookups and ordered chunk retrieval.vector-search-prompt.md (1)
1-63: LGTM! Well-structured implementation prompt.This documentation clearly outlines the vector search implementation requirements and aligns well with the actual implementation in this PR. The prompt covers all key aspects: architecture, chunking strategy, admin workflows, and search functionality.
src/lib/chunking.ts (1)
3-4: LGTM! Constants properly defined.The chunk size and overlap constants are appropriately defined at the top of the file, following the coding guidelines to avoid magic numbers.
Based on coding guidelines: "Never hard code magic numbers into code; consolidate them to the top of the file."
src/fn/vector-search.ts (1)
11-35: LGTM! Server functions properly implemented.All four server functions correctly follow the coding guidelines:
- ✓ Use
createServerFnfor server-side operations- ✓ Apply
adminMiddlewarefor admin-only access- ✓ Include input validators where needed
- ✓ Delegate to use-cases following layered architecture
- ✓ Access parameters through
datapropertyBased on coding guidelines and learnings: Server functions properly use middleware, validators, and delegate to use-cases.
src/routeTree.gen.ts (1)
1-1677: Auto-generated file – no review required.This file is automatically generated by TanStack Router and should not be manually edited or reviewed for code quality issues.
Based on learnings: "Routes are auto-generated to
/src/routeTree.gen.tsand should never be edited manually."src/lib/openai.ts (1)
8-8: The choice oftext-embedding-3-smallis appropriate and well-documented.OpenAI's
text-embedding-3-small(1536 dimensions, 8,192 token context) is actively maintained and recommended as the optimal cost-to-quality choice for production embedding workloads. Dimensions are standard, rate limits are tiered and documented, and there are no deprecation concerns. Considertext-embedding-3-largeonly if your use case requires maximum accuracy and your budget allows for the higher cost ($0.13 vs $0.02 per 1M tokens).src/routes/admin/vector-search.tsx (1)
29-172: Well-structured component implementation.The component correctly uses
PageandPageHeadercomponents, follows shadcn/ui patterns with proper Card composition, handles loading/empty states gracefully, and correctly invokes the server function with the{ data: {...} }pattern.src/routes/admin/vectorization.tsx (1)
44-267: Solid implementation with proper patterns.The component correctly uses shadcn Card components, handles mutations with toast feedback, invalidates queries appropriately, and manages processing state for individual segments. Good use of
PageandPageHeadercomponents.src/use-cases/vector-search.ts (3)
16-45: Clean segment vectorization implementation.The use-case properly validates the segment, deletes existing chunks for idempotency, chunks the transcript, and handles the empty chunks case. Good separation of concerns.
72-98: Good error isolation in bulk vectorization.The bulk operation correctly isolates errors per segment, allowing successful segments to complete while collecting errors for reporting. This is a resilient pattern for batch operations.
100-110: Simple and effective search implementation.The early return for empty queries and delegation to data-access layer follows good patterns.
src/db/schema.ts (3)
17-29: Well-implemented custom vector type for pgvector.The custom type correctly handles serialization to/from the pgvector string format. Using
JSON.parsefor thefromDriverconversion is appropriate since pgvector's output format[1,2,3,...]is valid JSON.
176-197: Appropriate schema design for transcript chunks.The table structure includes proper foreign key with cascade delete, useful indexes for segment-based queries and ordering by chunk index. The nullable
embeddingcolumn allows storing chunks before embeddings are generated.
792-803: Relations correctly defined.The bidirectional relationship between segments and transcript chunks is properly established.
src/data-access/transcript-chunks.ts (2)
6-49: Clean repository pattern implementation.The CRUD functions handle edge cases (empty inputs), use appropriate Drizzle ORM methods, and return useful data types (Map for count lookups). Good adherence to the data-access layer patterns.
51-60: Well-defined SearchResult interface.The interface provides a clear contract for search results with all necessary fields for the UI layer.
| CREATE TABLE "app_transcript_chunk" ( | ||
| "id" serial PRIMARY KEY NOT NULL, | ||
| "segmentId" serial NOT NULL, | ||
| "chunkIndex" integer NOT NULL, | ||
| "chunkText" text NOT NULL, | ||
| "embedding" vector(1536), | ||
| "tokenCount" integer NOT NULL, | ||
| "created_at" timestamp DEFAULT now() NOT NULL, | ||
| "updated_at" timestamp DEFAULT now() NOT NULL | ||
| ); |
There was a problem hiding this comment.
Critical: Fix segmentId type from serial to integer.
Line 6 defines segmentId as serial, but this column is a foreign key (Line 15) referencing app_segment(id). The serial type creates an auto-incrementing sequence, which is incorrect for a foreign key column.
🔎 Proposed fix
CREATE TABLE "app_transcript_chunk" (
"id" serial PRIMARY KEY NOT NULL,
- "segmentId" serial NOT NULL,
+ "segmentId" integer NOT NULL,
"chunkIndex" integer NOT NULL,
"chunkText" text NOT NULL,
"embedding" vector(1536),
"tokenCount" integer NOT NULL,
"created_at" timestamp DEFAULT now() NOT NULL,
"updated_at" timestamp DEFAULT now() NOT NULL
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CREATE TABLE "app_transcript_chunk" ( | |
| "id" serial PRIMARY KEY NOT NULL, | |
| "segmentId" serial NOT NULL, | |
| "chunkIndex" integer NOT NULL, | |
| "chunkText" text NOT NULL, | |
| "embedding" vector(1536), | |
| "tokenCount" integer NOT NULL, | |
| "created_at" timestamp DEFAULT now() NOT NULL, | |
| "updated_at" timestamp DEFAULT now() NOT NULL | |
| ); | |
| CREATE TABLE "app_transcript_chunk" ( | |
| "id" serial PRIMARY KEY NOT NULL, | |
| "segmentId" integer NOT NULL, | |
| "chunkIndex" integer NOT NULL, | |
| "chunkText" text NOT NULL, | |
| "embedding" vector(1536), | |
| "tokenCount" integer NOT NULL, | |
| "created_at" timestamp DEFAULT now() NOT NULL, | |
| "updated_at" timestamp DEFAULT now() NOT NULL | |
| ); |
🤖 Prompt for AI Agents
In drizzle/0050_glossy_infant_terrible.sql around lines 4 to 13, change the
"segmentId" column from type serial to integer NOT NULL so it does not create an
auto-increment sequence and correctly matches the referenced app_segment(id)
foreign key; update the migration SQL to use integer for segmentId, ensure any
sequence creation for that column is removed or not executed, and keep the NOT
NULL and foreign key constraint intact.
…t IDs - Added new SQL migration file to alter multiple tables, changing user and segment ID columns from serial to integer types. - Updated schema definitions in TypeScript to reflect these changes, ensuring consistency across the application. - Introduced a new snapshot for the database schema to capture the latest structure.
| --> statement-breakpoint | ||
| CREATE TABLE "app_transcript_chunk" ( | ||
| "id" serial PRIMARY KEY NOT NULL, | ||
| "segmentId" serial NOT NULL, |
There was a problem hiding this comment.
Foreign key column incorrectly defined as serial type
The segmentId column in app_transcript_chunk is defined as serial but it's a foreign key referencing app_segment.id. Using serial creates an auto-increment sequence, which is incorrect for a foreign key column. While migration 0051 changes the type to integer, PostgreSQL's SET DATA TYPE doesn't remove the DEFAULT value created by serial, potentially leaving auto-increment behavior on a FK column.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/db/schema.ts (1)
17-29: Extract the embedding dimension to a config constant.The dimension
1536is a magic number tied to OpenAI's embedding model. Per coding guidelines, consolidate this to/src/config/index.tsfor maintainability and easier updates if you switch embedding models.🔎 Suggested refactor
In
src/config/index.ts:export const EMBEDDING_DIMENSION = 1536;Then in schema:
+import { EMBEDDING_DIMENSION } from "@/config"; + // Custom type for pgvector const vector = customType<{ data: number[]; driverData: string }>({ dataType() { - return "vector(1536)"; + return `vector(${EMBEDDING_DIMENSION})`; }, // ... });
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
drizzle/0051_lowly_songbird.sqldrizzle/meta/0051_snapshot.jsondrizzle/meta/_journal.jsonsrc/db/schema.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- drizzle/meta/_journal.json
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: When invoking a TanStack server function, pass an object with adataproperty containing parameters
All card components should use the shadcnCardcomponent withCardTitle,CardDescription, and other shadcn Card subcomponents
All server-side operations should use TanStack Start server functions created withcreateServerFninstead of direct backend logic
Form handling should use React Hook Form with Zod validation
**/*.{ts,tsx}: Never hard code magic numbers into code; consolidate them to the top of the file or in/src/config/index.tsto keep the code clean
Never allow a file to exceed 1,000 lines of code; split overly large files into smaller modular components to maintain code maintainability
Files:
src/db/schema.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Playwright Tests
🔇 Additional comments (6)
drizzle/0051_lowly_songbird.sql (1)
1-34: Large schema migration with column type changes.This migration modifies many foreign key columns from serial to integer types across the database. A few points to verify:
- Data compatibility: Ensure existing data in these columns is compatible with the integer type (should be fine since serial produces integers).
- Lines 20 and 27: These drop NOT NULL constraints on
category_idandsegmentIdrespectively—confirm this is intentional for your data model.- Missing newline at EOF: Add a trailing newline for POSIX compliance.
src/db/schema.ts (5)
176-197: LGTM!The
transcriptChunkstable is well-designed:
- Nullable
embeddingallows a two-phase insert (chunk first, embed later)- Proper cascade delete on the segment FK
- Good index coverage for segment lookups and ordered retrieval
239-241: Verify nullablesegmentIdis intentional.The
segmentIdforeign key is now nullable, which allows progress records without a specific segment. Ensure this aligns with your business logic for tracking progress.
797-808: LGTM!The relations are properly defined, enabling bi-directional navigation between segments and their transcript chunks.
1206-1207: LGTM!Type exports follow the established pattern and provide proper type inference for the new entity.
468-481: LGTM!The enum and table definitions are correctly structured. The multiline formatting improves readability.
- Changed PostgreSQL service image in GitHub Actions to use pgvector for improved vector operations. - Refactored chunking logic to ensure proper handling of token processing. - Introduced retry logic for embedding generation with error handling for API responses. - Added validation for input text in embedding functions to ensure non-empty strings. - Implemented admin-only access for vector search and vectorization routes, enhancing security. - Updated OpenAI API key retrieval to use environment variables for better configuration management.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/lib/openai.ts (2)
8-11: Consider centralizing configuration constants.While these constants are properly placed at the top of the file, consider moving them to
/src/config/index.tsfor consistency with other application configuration values (as suggested in the coding guidelines). This would make it easier to adjust embedding parameters across the application.Example centralized configuration
In
/src/config/index.ts:export const OPENAI_CONFIG = { EMBEDDING_MODEL: "text-embedding-3-small", MAX_RETRIES: 3, INITIAL_RETRY_DELAY_MS: 1000, MAX_BATCH_SIZE: 100, } as const;Then import in this file:
import { OPENAI_CONFIG } from "~/config";
25-27: Extract duplicatedsleeputility to a shared module.This
sleepfunction is duplicated insrc/lib/video-processing-worker.ts(lines 353-355). Extract it to a shared utility module (e.g.,src/utils/async.tsorsrc/utils/helpers.ts) to avoid code duplication and improve maintainability.🔎 Proposed refactor
Create a new file
src/utils/async.ts:export async function sleep(ms: number): Promise<void> { return new Promise((resolve) => setTimeout(resolve, ms)); }Then import in both files:
import { sleep } from "~/utils/async";And update
src/lib/video-processing-worker.tsto use the shared utility instead of its private method.src/routes/admin/vectorization.tsx (1)
139-147: Consider handling unmatched segments.Segments without a matching module title are silently skipped. While this appears intentional, you could improve observability by logging or displaying a warning when segments can't be matched to modules.
💡 Optional enhancement
segments.forEach((segment) => { const moduleId = modules.find((m) => m.title === segment.moduleTitle)?.id; if (moduleId) { const moduleData = segmentsByModule.get(moduleId); if (moduleData) { moduleData.segments.push(segment); } + } else { + console.warn(`Segment "${segment.title}" has no matching module: "${segment.moduleTitle}"`); } });
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/test.ymlsrc/lib/chunking.tssrc/lib/openai.tssrc/routes/admin/vector-search.tsxsrc/routes/admin/vectorization.tsxsrc/utils/openai.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/chunking.ts
- src/routes/admin/vector-search.tsx
🧰 Additional context used
📓 Path-based instructions (4)
**/routes/admin/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
When making an admin page, protect it using
assertIsAdminFn()in thebeforeLoadhook withcreateFileRoute
Files:
src/routes/admin/vectorization.tsx
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: When invoking a TanStack server function, pass an object with adataproperty containing parameters
All card components should use the shadcnCardcomponent withCardTitle,CardDescription, and other shadcn Card subcomponents
All server-side operations should use TanStack Start server functions created withcreateServerFninstead of direct backend logic
Form handling should use React Hook Form with Zod validation
**/*.{ts,tsx}: Never hard code magic numbers into code; consolidate them to the top of the file or in/src/config/index.tsto keep the code clean
Never allow a file to exceed 1,000 lines of code; split overly large files into smaller modular components to maintain code maintainability
Files:
src/routes/admin/vectorization.tsxsrc/utils/openai.tssrc/lib/openai.ts
**/routes/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
**/routes/**/*.tsx: Pages should use thePagecomponent andPageHeadercomponent when possible
Implement error boundaries at the route level usingDefaultCatchBoundary
Files:
src/routes/admin/vectorization.tsx
**/{components,routes}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Component styling should follow shadcn/ui patterns with Tailwind CSS v4
Files:
src/routes/admin/vectorization.tsx
🧠 Learnings (5)
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/routes/admin/**/*.tsx : When making an admin page, protect it using `assertIsAdminFn()` in the `beforeLoad` hook with `createFileRoute`
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Routes are auto-generated to `/src/routeTree.gen.ts` and should never be edited manually
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-26T04:51:53.750Z
Learnt from: amoscicki
Repo: webdevcody/agentic-jumpstart PR: 46
File: src/fn/dev/middleware.ts:3-8
Timestamp: 2025-12-26T04:51:53.750Z
Learning: In this codebase, dev-mode protection uses different strategies by context: server functions (using DevGuardMiddleware) throw errors that are caught by error boundaries, while route-level protection (like assertDevModeFn in beforeLoad) redirects to homepage for better UX. This prevents users from seeing error pages when accidentally accessing dev routes in production.
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/routes/**/*.tsx : Pages should use the `Page` component and `PageHeader` component when possible
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-26T04:53:33.715Z
Learnt from: amoscicki
Repo: webdevcody/agentic-jumpstart PR: 46
File: .github/workflows/test.yml:40-40
Timestamp: 2025-12-26T04:53:33.715Z
Learning: In CI workflow files under .github/workflows, if there is a documented platform-specific npm bug (e.g., npm bug #4828 with optional dependencies), use npm install with rm -f package-lock.json to resolve platform binaries as described in repository comments. This is a deliberate trade-off against npm ci. Ensure the rationale and any caveats (potential longer install times and possible lockfile drift) are documented in the workflow comments, and apply this pattern only where the workaround is validated by maintainers.
Applied to files:
.github/workflows/test.yml
🧬 Code graph analysis (2)
src/utils/openai.ts (1)
src/utils/env.ts (1)
env(27-47)
src/lib/openai.ts (2)
src/utils/env.ts (1)
env(27-47)src/lib/video-processing-worker.ts (1)
sleep(354-356)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Playwright Tests
🔇 Additional comments (7)
src/utils/openai.ts (1)
6-8: LGTM! Centralizing environment variable access.The change to use the
envutility instead of directprocess.envaccess is a good practice that centralizes environment variable management and improves testability with fallback values.src/lib/openai.ts (5)
1-6: LGTM: Clean integration with centralized environment configuration.The OpenAI client initialization correctly uses the centralized
envmodule, which provides proper environment variable handling with test fallbacks.
13-23: LGTM: Well-designed custom error class.The
EmbeddingErrorclass provides rich contextual information (code, status, context) that will be valuable for debugging and error handling in upstream code.
29-76: LGTM: Robust retry mechanism with exponential backoff.The
withRetryfunction implements a well-designed retry strategy:
- Handles transient OpenAI API errors (rate limits, server errors)
- Exponential backoff prevents overwhelming the API
- Rich error context for debugging
- Type-safe generic implementation
78-108: Excellent implementation with comprehensive error handling.The function now includes all the error handling that was flagged in previous reviews:
- Thorough input validation (type, empty, whitespace-only)
- Retry logic via
withRetrywrapper for transient API failures- Response validation to ensure embedding data exists
- Rich contextual error information for debugging
The previous review concerns have been fully addressed.
110-170: Excellent batch processing implementation with thorough validation.The function comprehensively addresses all concerns from previous reviews:
- Array and per-item input validation with precise error indices
- Proper batch processing respecting
MAX_BATCH_SIZE(100) to avoid API limits- Retry logic for each batch via
withRetrywrapper- Multi-level response validation (batch length and individual embeddings)
- Rich error context including batch indices for debugging
Sequential batch processing is appropriate here—parallel requests could trigger rate limits. The implementation is robust and production-ready.
src/routes/admin/vectorization.tsx (1)
32-35: Admin protection correctly implemented.The route now includes the required
assertIsAdminFn()call in thebeforeLoadhook, addressing the previous review concern.Based on coding guidelines and past review comments.
| services: | ||
| postgres-test: | ||
| image: postgres:17 | ||
| image: pgvector/pgvector:pg17 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify pgvector/pgvector:pg17 image availability and check for vulnerabilities
# Check if the Docker image exists on Docker Hub
echo "=== Checking Docker Hub for pgvector/pgvector image tags ==="
curl -s "https://hub.docker.com/v2/repositories/pgvector/pgvector/tags?page_size=100" | \
jq -r '.results[] | select(.name == "pg17" or .name | startswith("17")) | .name' | head -20Repository: webdevcody/agentic-jumpstart
Length of output: 211
🌐 Web query:
pgvector/pgvector:pg17 Docker image security advisories vulnerabilities
💡 Result:
I found public security scan results showing the Docker tag pgvector/pgvector:pg17 contains multiple OS-package vulnerabilities (several High and a few Critical/Severe) and at least some identified CVEs. Sources:
- Official pgvector repo (shows pg17 tag exists and recommended tags). [1]
- Trivy scan results for docker.io/pgvector/pgvector:pg17 listing many findings (example CVE entries: CVE-2023-45853, CVE-2024-26462, CVE-2023-2953). [2]
- Mirror/image metadata page with the Trivy summary (counts: dozens of Low/Medium, multiple High, several Severe). [3]
- Snyk entry for a pgvector-related CVE (package-level advisory). [4]
Recommendation (concise): avoid running the pg17 image in production until you (a) pull a newer tag or rebuild from source with updated base packages, (b) scan the exact image you plan to use (Trivy/Clair/etc.), and (c) apply fixes/updates for the listed CVEs.
Sources:
- pgvector GitHub (image tags). [1]
- Trivy scan details for docker.io/pgvector/pgvector:pg17. [2]
- Docker image metadata/scan summary for pgvector:pg17. [3]
- Snyk advisory for a pgvector package CVE. [4]
🌐 Web query:
pgvector Docker Hub official image maintenance
💡 Result:
Short answer: The pgvector project publishes and maintains Docker images from its GitHub repository (pgvector/pgvector) — those are the project’s “official” images — and there are also multiple third‑party/community pgvector images on Docker Hub. See the project README (Docker tags and docker pull instructions) and examples of community images on Docker Hub. [1][2]
Sources:
- pgvector GitHub (Docker usage, supported tags). [1]
- Example community Docker Hub pgvector repositories. [2]
LGTM—but verify image security before merging.
The update from postgres:17 to pgvector/pgvector:pg17 is appropriate for testing vector search functionality. The image exists and is officially maintained by the pgvector GitHub repository.
Note: Security scans show the pg17 tag contains multiple OS-package vulnerabilities (High and Critical severity), including CVE-2023-45853, CVE-2024-26462, and CVE-2023-2953. While this is a test environment (lower risk), consider:
- Using a newer pgvector tag if available (e.g.,
latestor a more recent version) - Documenting that this image is for testing only and updating when patches are released
The pgvector extension initialization is properly handled in migrations as mentioned in the PR summary.
🤖 Prompt for AI Agents
In .github/workflows/test.yml around line 17, the workflow switches the test DB
image to pgvector/pgvector:pg17 which is known to contain High/Critical
OS-package CVEs; update the image to a newer, patched pgvector tag (or explicit
digest) that fixes those CVEs, validate compatibility by running the test matrix
locally/CI, then update the workflow line to that tag/digest; additionally add a
short comment in the workflow noting this image is test-only and create or
enable periodic image security scans (or Dependabot) to track future security
fixes.
- Modified the `handleVectorizeAll` function to pass an empty data object to the mutation, ensuring proper data structure for the vectorization process. - Enhanced the `isSegmentCompleted` function in `ModulePanel` to prioritize local uncompleted segment states, improving immediate UI feedback for segment completion status.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/routes/admin/vectorization.tsx (1)
139-147: Minor: Title-based module lookup.The segment grouping uses
module.titlematching (line 140) which is O(n×m) and could be fragile if titles aren't unique. If the API can returnmoduleIddirectly on segments, that would be more robust. For an admin page with modest data volumes, this works fine.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/routes/admin/vectorization.tsxsrc/routes/learn/-components/module-panel.tsx
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: When invoking a TanStack server function, pass an object with adataproperty containing parameters
All card components should use the shadcnCardcomponent withCardTitle,CardDescription, and other shadcn Card subcomponents
All server-side operations should use TanStack Start server functions created withcreateServerFninstead of direct backend logic
Form handling should use React Hook Form with Zod validation
**/*.{ts,tsx}: Never hard code magic numbers into code; consolidate them to the top of the file or in/src/config/index.tsto keep the code clean
Never allow a file to exceed 1,000 lines of code; split overly large files into smaller modular components to maintain code maintainability
Files:
src/routes/learn/-components/module-panel.tsxsrc/routes/admin/vectorization.tsx
**/routes/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
**/routes/**/*.tsx: Pages should use thePagecomponent andPageHeadercomponent when possible
Implement error boundaries at the route level usingDefaultCatchBoundary
Files:
src/routes/learn/-components/module-panel.tsxsrc/routes/admin/vectorization.tsx
**/{components,routes}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Component styling should follow shadcn/ui patterns with Tailwind CSS v4
Files:
src/routes/learn/-components/module-panel.tsxsrc/routes/admin/vectorization.tsx
**/routes/admin/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
When making an admin page, protect it using
assertIsAdminFn()in thebeforeLoadhook withcreateFileRoute
Files:
src/routes/admin/vectorization.tsx
🧠 Learnings (4)
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/routes/admin/**/*.tsx : When making an admin page, protect it using `assertIsAdminFn()` in the `beforeLoad` hook with `createFileRoute`
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Routes are auto-generated to `/src/routeTree.gen.ts` and should never be edited manually
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-26T04:51:53.750Z
Learnt from: amoscicki
Repo: webdevcody/agentic-jumpstart PR: 46
File: src/fn/dev/middleware.ts:3-8
Timestamp: 2025-12-26T04:51:53.750Z
Learning: In this codebase, dev-mode protection uses different strategies by context: server functions (using DevGuardMiddleware) throw errors that are caught by error boundaries, while route-level protection (like assertDevModeFn in beforeLoad) redirects to homepage for better UX. This prevents users from seeing error pages when accidentally accessing dev routes in production.
Applied to files:
src/routes/admin/vectorization.tsx
📚 Learning: 2025-12-19T06:04:43.442Z
Learnt from: CR
Repo: webdevcody/agentic-jumpstart PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-19T06:04:43.442Z
Learning: Applies to **/routes/**/*.tsx : Pages should use the `Page` component and `PageHeader` component when possible
Applied to files:
src/routes/admin/vectorization.tsx
🧬 Code graph analysis (2)
src/routes/learn/-components/module-panel.tsx (2)
src/routes/learn/-components/segment-context.tsx (1)
useSegment(58-64)src/db/schema.ts (1)
progress(232-250)
src/routes/admin/vectorization.tsx (3)
src/fn/vector-search.ts (3)
getVectorizationStatusFn(31-35)vectorizeAllSegmentsFn(18-22)vectorizeSegmentFn(11-16)src/routes/admin/-components/page.tsx (1)
Page(7-21)src/db/schema.ts (2)
modules(115-126)segments(128-153)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Playwright Tests
🔇 Additional comments (8)
src/routes/learn/-components/module-panel.tsx (1)
7-7: LGTM! Clean implementation of optimistic UI updates.The integration of local segment state via the
useSegmenthook is well-implemented. The precedence logic inisSegmentCompletedcorrectly prioritizes local uncompleted state for immediate UI feedback, then falls back to combining server progress with local completed state. The comments clearly explain the behavior, and the implementation provides a smooth user experience by avoiding UI lag when users mark segments as complete or incomplete.Also applies to: 38-51
src/routes/admin/vectorization.tsx (7)
32-35: Admin protection correctly implemented.The route properly uses
assertIsAdminFn()in thebeforeLoadhook to protect this admin page, satisfying the coding guidelines. Based on learnings and coding guidelines.
37-44: LGTM!Clean query options setup with proper type inference using
Awaited<ReturnType<...>>pattern.
53-102: Mutations follow the correct invocation pattern.Both mutation handlers properly use the
{ data: ... }pattern as per coding guidelines (lines 96, 101). Error handling correctly clears the processing state to prevent UI from getting stuck. Query invalidation properly refreshes the status data.
104-122: Loading and error states look good.The loading and error UI properly wraps content in the
Pagecomponent. Consider whether you need to adderrorComponent: DefaultCatchBoundaryto the route definition for consistent error handling across the admin routes.
149-176: Good use of PageHeader with conditional actions.Proper use of
PageandPageHeadercomponents per coding guidelines. The button correctly disables based on mutation state and data conditions (lines 158-160).
178-234: Statistics cards follow shadcn patterns.All five statistic cards properly use
Card,CardHeader,CardTitle, andCardContentsubcomponents as per coding guidelines. Color coding (green for vectorized, amber for needs work) provides clear visual feedback.
271-343: Well-structured SegmentRow component.Clean separation with typed props interface. The conditional rendering covers all segment states clearly (needs vectorization, completed, no transcript) with appropriate visual indicators. The processing state properly disables the button and shows feedback.
- Introduced local state management for segment completion status by integrating the useSegment context. - Updated isSegmentCompleted function to prioritize local uncompleted segments for immediate UI feedback. - Adjusted moduleProgress calculation to include locally completed and uncompleted segment IDs, improving accuracy in progress representation.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/routes/learn/-components/module-accordion-header.tsx (1)
70-80: Consider memoizing or inlining theisSegmentCompletedhelper.The
isSegmentCompletedfunction is defined in the component body but not wrapped inuseCallback, which means it's recreated on every render. While the logic is correct and will work as intended (since all dependencies are properly tracked in theuseMemoat line 103), ESLint's exhaustive-deps rule will likely warn that the function itself is missing from the dependency array.🔎 Refactor options
Option 1: Define the helper inside the
useMemo(recommended for this use case):const moduleProgress = useMemo(() => { + const isSegmentCompleted = (segmentId: number) => { + if (locallyUncompletedSegmentIds.has(segmentId)) { + return false; + } + return ( + progress.some((p) => p.segmentId === segmentId) || + locallyCompletedSegmentIds.has(segmentId) + ); + }; + if (!module.segments || !Array.isArray(module.segments)) { return { completed: 0, total: 0, percentage: 0, }; } const completedSegments = module.segments.filter((segment) => isSegmentCompleted(segment.id) ).length; return { completed: completedSegments, total: module.segments.length, percentage: module.segments.length > 0 ? (completedSegments / module.segments.length) * 100 : 0, }; }, [module.segments, progress, locallyCompletedSegmentIds, locallyUncompletedSegmentIds]); - - const isSegmentCompleted = (segmentId: number) => { - // Check local uncompleted state first (takes precedence) - if (locallyUncompletedSegmentIds.has(segmentId)) { - return false; - } - // Check both server progress and locally completed segments (for immediate UI feedback) - return ( - progress.some((p) => p.segmentId === segmentId) || - locallyCompletedSegmentIds.has(segmentId) - ); - };Option 2: Wrap in
useCallbackif needed elsewhere:+ const isSegmentCompleted = useCallback( + (segmentId: number) => { + if (locallyUncompletedSegmentIds.has(segmentId)) { + return false; + } + return ( + progress.some((p) => p.segmentId === segmentId) || + locallyCompletedSegmentIds.has(segmentId) + ); + }, + [progress, locallyCompletedSegmentIds, locallyUncompletedSegmentIds] + ); const moduleProgress = useMemo(() => { // ... rest of the code - }, [module.segments, progress, locallyCompletedSegmentIds, locallyUncompletedSegmentIds]); + }, [module.segments, progress, locallyCompletedSegmentIds, locallyUncompletedSegmentIds, isSegmentCompleted]);Also applies to: 93-93, 103-103
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/routes/learn/-components/module-accordion-header.tsx
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: When invoking a TanStack server function, pass an object with adataproperty containing parameters
All card components should use the shadcnCardcomponent withCardTitle,CardDescription, and other shadcn Card subcomponents
All server-side operations should use TanStack Start server functions created withcreateServerFninstead of direct backend logic
Form handling should use React Hook Form with Zod validation
**/*.{ts,tsx}: Never hard code magic numbers into code; consolidate them to the top of the file or in/src/config/index.tsto keep the code clean
Never allow a file to exceed 1,000 lines of code; split overly large files into smaller modular components to maintain code maintainability
Files:
src/routes/learn/-components/module-accordion-header.tsx
**/routes/**/*.tsx
📄 CodeRabbit inference engine (CLAUDE.md)
**/routes/**/*.tsx: Pages should use thePagecomponent andPageHeadercomponent when possible
Implement error boundaries at the route level usingDefaultCatchBoundary
Files:
src/routes/learn/-components/module-accordion-header.tsx
**/{components,routes}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Component styling should follow shadcn/ui patterns with Tailwind CSS v4
Files:
src/routes/learn/-components/module-accordion-header.tsx
🧬 Code graph analysis (1)
src/routes/learn/-components/module-accordion-header.tsx (2)
src/routes/learn/-components/segment-context.tsx (1)
useSegment(58-64)src/db/schema.ts (1)
progress(232-250)
🔇 Additional comments (1)
src/routes/learn/-components/module-accordion-header.tsx (1)
34-34: LGTM! Clean integration of segment context.The import and usage of the
useSegmenthook follows React patterns correctly and enables the desired local state tracking for immediate UI feedback.Also applies to: 62-63
…ranscripts
Note
Introduces vector support and schema for semantic transcript search.
docker-compose.ymland GitHub Actions topgvector/pgvector:pg17vectorextension and createsapp_transcript_chunkwithvector(1536)embeddings, indexes, and FK toapp_segmentintegerand relaxes select nullability constraintsWritten by Cursor Bugbot for commit e22e0f1. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Chores
Other
✏️ Tip: You can customize this high-level summary in your review settings.