Conversation
…imports - Separate Docker build stages to copy package.json before source, enabling yarn install layer cache reuse on source-only changes - Consolidate 4 separate yarn install calls into single install with --mode=skip-build, avoiding redundant installs in shared-libs-builder - Add BuildKit cache mount for .next/cache to persist Next.js compilation cache across Docker builds - Enable experimental.optimizePackageImports for 18 barrel-export packages (lucide-react, radix-ui, ai-sdk providers, etc.) to reduce compilation time - Make Sentry widenClientFileUpload and reactComponentAnnotation conditional on Sentry credentials being present, removing overhead from OSS builds - Add --parallel to root build and build:deps scripts for concurrent builds of independent packages (db, schemas, queryLanguage) - Add new build:web script for building only web package and dependencies Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
WalkthroughReworks Docker multi-stage builds and workspace install/build flow for cacheable, topological dependency resolution; enables parallel workspace builds; always enables Next.js experimental.optimizePackageImports and conditionally gates Sentry-related experimental flags in web config. Changes
Sequence Diagram(s)mermaid Dev->>Docker: Start multi-stage build Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The --mode=skip-build flag skips postinstall hooks including prisma generate, so the Prisma client types are missing in web-builder and backend-builder stages. Run prisma:generate explicitly after install. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Dockerfile (1)
172-172: Consider adding BuildKit cache mount for backend build.The web build uses a cache mount for
.next/cacheto enable incremental compilation. The backend TypeScript build might also benefit from caching thenode_modules/.cachedirectory or similar, though the impact would be smaller than for Next.js.♻️ Optional: Add cache mount for backend build
-RUN yarn workspace `@sourcebot/backend` build +RUN --mount=type=cache,target=/app/node_modules/.cache \ + yarn workspace `@sourcebot/backend` build🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` at line 172, Add a BuildKit cache mount to the backend build RUN to reuse TypeScript/yarn caches: update the RUN that executes "yarn workspace `@sourcebot/backend` build" to use BuildKit's --mount=type=cache flag (e.g. target a yarn/ts cache like /root/.cache or node_modules/.cache and optionally a yarn cache path such as /usr/local/share/.cache/yarn) so the build step caches between runs; ensure the Dockerfile keeps the exact command "yarn workspace `@sourcebot/backend` build" but prefixed with the appropriate RUN --mount=type=cache,... syntax and choose cache target paths relevant to your environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Dockerfile`:
- Line 172: Add a BuildKit cache mount to the backend build RUN to reuse
TypeScript/yarn caches: update the RUN that executes "yarn workspace
`@sourcebot/backend` build" to use BuildKit's --mount=type=cache flag (e.g. target
a yarn/ts cache like /root/.cache or node_modules/.cache and optionally a yarn
cache path such as /usr/local/share/.cache/yarn) so the build step caches
between runs; ensure the Dockerfile keeps the exact command "yarn workspace
`@sourcebot/backend` build" but prefixed with the appropriate RUN
--mount=type=cache,... syntax and choose cache target paths relevant to your
environment.
Add --immutable flag to all yarn install commands in Docker for reproducible builds (prevents lockfile modifications). Combine sequential RUN commands in the runner stage to reduce final image layer count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
Dockerfile (1)
103-112: Prisma files are currently busting the dependency-install cache.Because
packages/db/prismais copied beforeyarn install, schema/migration edits invalidate the expensive install layer. Move Prisma copy +prisma:generateto a later layer in both builders.♻️ Suggested Docker layering change
# Step 1: Install dependencies (cached unless package.json/lock changes). COPY package.json yarn.lock* .yarnrc.yml ./ COPY .yarn ./.yarn COPY ./packages/web/package.json ./packages/web/package.json COPY ./packages/db/package.json ./packages/db/package.json -COPY ./packages/db/prisma ./packages/db/prisma ... -RUN yarn install --immutable --mode=skip-build && \ - yarn workspace `@sourcebot/db` prisma:generate +RUN yarn install --immutable --mode=skip-build + +# Step 1b: Prisma client generation (only reruns when prisma files change) +COPY ./packages/db/prisma ./packages/db/prisma +RUN yarn workspace `@sourcebot/db` prisma:generateApply the same pattern in
backend-builder.Also applies to: 154-162
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 103 - 112, The Dockerfile currently copies packages/db/prisma (and other Prisma files) before the expensive RUN yarn install step, which busts the dependency cache; update both builder stages (including backend-builder) to move the COPY ./packages/db/prisma and any other Prisma-related COPY lines plus the yarn workspace `@sourcebot/db` prisma:generate invocation to a later layer after the yarn install --immutable step so schema/migration edits won't invalidate the install layer; keep application package.json copies needed for install before RUN yarn install, and only copy Prisma schemas and run prisma:generate after dependencies are installed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Dockerfile`:
- Around line 279-286: Replace the broad git safe.directory wildcard with
explicit trusted paths: change the git config invocation that sets
safe.directory "*" so it instead registers only the specific repo and index
directories (e.g. /data/.sourcebot/repos and /data/.sourcebot/index derived from
DATA_CACHE_DIR) using separate git config --global --add safe.directory <path>
entries; ensure any documentation or startup notes mention ownership
requirements for file:// mounted repos so per-repository configs can be used
when appropriate.
---
Nitpick comments:
In `@Dockerfile`:
- Around line 103-112: The Dockerfile currently copies packages/db/prisma (and
other Prisma files) before the expensive RUN yarn install step, which busts the
dependency cache; update both builder stages (including backend-builder) to move
the COPY ./packages/db/prisma and any other Prisma-related COPY lines plus the
yarn workspace `@sourcebot/db` prisma:generate invocation to a later layer after
the yarn install --immutable step so schema/migration edits won't invalidate the
install layer; keep application package.json copies needed for install before
RUN yarn install, and only copy Prisma schemas and run prisma:generate after
dependencies are installed.
| RUN git config --global safe.directory "*" && \ | ||
| mkdir -p /run/postgresql && \ | ||
| chown -R postgres:postgres /run/postgresql && \ | ||
| chmod 775 /run/postgresql | ||
|
|
||
| # Make app directory accessible to both root and sourcebot user | ||
| RUN chown -R sourcebot /app \ | ||
| && chgrp -R 0 /app \ | ||
| && chmod -R g=u /app | ||
| # Make data directory accessible to both root and sourcebot user | ||
| RUN chown -R sourcebot /data | ||
| chmod 775 /run/postgresql && \ | ||
| chown -R sourcebot /app && \ | ||
| chgrp -R 0 /app && \ | ||
| chmod -R g=u /app && \ | ||
| chown -R sourcebot /data |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, find relevant files
echo "=== Finding relevant files ==="
fd -t f -E node_modules -E .git 'Dockerfile|entrypoint|supervisord|prefix-output' . 2>/dev/null | head -20 || true
echo -e "\n=== Searching for safe.directory and git invocations ==="
rg -n 'safe\.directory|git\s+(config|clone|pull|fetch)' --max-count=20 || true
echo -e "\n=== Searching for user context changes (RUN USER, su-exec, gosu, sudo) ==="
rg -n '^\s*USER\b|su-exec|gosu|sudo\s+' --max-count=20 || true
echo -e "\n=== Checking Dockerfile structure around line 279 ==="
fd -t f Dockerfile | head -1 | xargs -I {} sed -n '270,295p' {} || trueRepository: sourcebot-dev/sourcebot
Length of output: 6516
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== entrypoint.sh content ==="
cat -n ./entrypoint.sh || true
echo -e "\n=== supervisord.conf relevant sections ==="
rg -n 'user|command|git' supervisord.conf -A 3 || true
echo -e "\n=== git.ts to understand git operation directories ==="
head -100 ./packages/backend/src/git.ts || true
echo -e "\n=== Find all git clone/fetch operations in code ==="
rg -n 'git.*clone|git.*fetch' packages/ -A 2 -B 2 | head -60 || trueRepository: sourcebot-dev/sourcebot
Length of output: 20108
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full supervisord.conf ==="
cat supervisord.conf || true
echo -e "\n=== Check if user is specified in supervisord.conf ==="
rg -n '^\s*(user|group)\s*=' supervisord.conf || echo "No user/group found"
echo -e "\n=== Check git invocations - which directories are accessed ==="
rg -n 'cloneRepository|createGitClientForPath|git.*cwd' packages/backend/src/ -B 3 -A 3 | head -80 || true
echo -e "\n=== Check what DATA_CACHE_DIR is set to ==="
rg -n 'DATA_CACHE_DIR' entrypoint.sh | head -10 || true
echo -e "\n=== Find where repositories are stored ==="
rg -n 'repoPath\s*=' packages/backend/src/ -A 1 | head -30 || trueRepository: sourcebot-dev/sourcebot
Length of output: 8589
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Where is repoPath defined/constructed ==="
rg -n 'repoPath\s*=' packages/backend/src/repoIndexManager.ts -B 3 -A 1 | head -50 || true
echo -e "\n=== Check INDEX_CACHE_DIR constant ==="
rg -n 'INDEX_CACHE_DIR' packages/backend/src/constants.ts -A 2 -B 2 || true
echo -e "\n=== Check if /data is referenced anywhere ==="
rg -n '/data|DATA_CACHE_DIR' packages/backend/src/ --type ts | head -20 || true
echo -e "\n=== Search for default values of DATA_CACHE_DIR ==="
rg -n 'DATA_CACHE_DIR|INDEX_CACHE_DIR' . --type sh --type ts | grep -E '\s?=|default|export' | head -15 || trueRepository: sourcebot-dev/sourcebot
Length of output: 2122
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for DATA_CACHE_DIR default value in env setup ==="
rg -n 'DATA_CACHE_DIR' . --type ts --type js -B 2 -A 2 | grep -E 'default|DATA_CACHE_DIR\s*=' | head -20 || true
echo -e "\n=== Check .env or environment files ==="
fd -t f '.env*|environment' . | head -10 || true
echo -e "\n=== Check Dockerfile for DATA_CACHE_DIR or /data setup ==="
sed -n '260,310p' Dockerfile || true
echo -e "\n=== Look for env variable declarations ==="
rg -n 'DATA_CACHE_DIR|/data|/app' Dockerfile | head -20 || true
echo -e "\n=== Check shared env setup ==="
rg -n 'DATA_CACHE_DIR' packages/shared/ -A 2 -B 2 || trueRepository: sourcebot-dev/sourcebot
Length of output: 4319
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Confirm git operations are only in /app and /data/.sourcebot paths ==="
echo "Git clone/fetch operations in code:"
rg -n 'cloneUrl|repoPath|createGitClientForPath' packages/backend/src/git.ts packages/backend/src/repoIndexManager.ts | head -40 || true
echo -e "\n=== Check if git is invoked anywhere else at runtime ==="
rg -n 'git\s+(clone|fetch|config)' packages/ --type ts --type js | grep -v node_modules | grep -v dist | head -30 || true
echo -e "\n=== Verify safe.directory is only set once (global scope) ==="
rg -n 'safe\.directory' . --type ts --type js --type sh | grep -v node_modules || true
echo -e "\n=== Check if there's documentation on the volume mounting issue ==="
rg -n 'dubious ownership|safe\.directory|volume' docs/ || trueRepository: sourcebot-dev/sourcebot
Length of output: 8435
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Verify repoPath construction - where exactly are repos stored ==="
rg -n 'getRepoPath' packages/backend/src/ -A 5 -B 1 | head -50 || true
echo -e "\n=== Check if git might run against /app or other locations ==="
rg -n 'cwd|path.*=|working.*dir' packages/backend/src/git.ts | head -20 || true
echo -e "\n=== Confirm all git.* functions use createGitClientForPath ==="
rg -n 'export.*const.*\(' packages/backend/src/git.ts | head -20 || trueRepository: sourcebot-dev/sourcebot
Length of output: 5486
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Find getRepoPath implementation ==="
rg -n 'export.*getRepoPath|function getRepoPath' packages/shared/src/ -A 15 || true
echo -e "\n=== Check if getRepoPath can return paths outside /data ==="
rg -n 'getRepoPath' packages/shared/src/utils.ts -A 20 -B 5 || trueRepository: sourcebot-dev/sourcebot
Length of output: 2155
Consider narrowing safe.directory to specific paths rather than wildcard.
The global wildcard disables Git's ownership safety across the entire filesystem. Instead, explicitly trust only the paths where Git actually operates: /data/.sourcebot/repos and /data/.sourcebot/index (derived from DATA_CACHE_DIR). For local repositories mounted via file:// URLs, consider using per-repository configuration or documenting the ownership requirements for mounted volumes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` around lines 279 - 286, Replace the broad git safe.directory
wildcard with explicit trusted paths: change the git config invocation that sets
safe.directory "*" so it instead registers only the specific repo and index
directories (e.g. /data/.sourcebot/repos and /data/.sourcebot/index derived from
DATA_CACHE_DIR) using separate git config --global --add safe.directory <path>
entries; ensure any documentation or startup notes mention ownership
requirements for file:// mounted repos so per-repository configs can be used
when appropriate.
Summary
Optimize web package and Docker builds through improved layer caching and Next.js compile-time optimizations:
yarn workspace installcalls with singleyarn install --mode=skip-buildfor faster resolution.next/cacheacross Docker builds for incremental Next.js compilationexperimental.optimizePackageImportsfor 18 barrel-export packages to reduce compilation timewidenClientFileUpload,reactComponentAnnotation) when Sentry is configured--parallelflag to root build scripts for concurrent compilation of independent packagesTest plan
yarn build:webcompletes successfully locallydocker buildx build --check .validates Dockerfile syntaxoptimizePackageImportsexperiments active🤖 Generated with Claude Code
Summary by CodeRabbit