Skip to content

perf(web): optimize build performance#944

Closed
msukkari wants to merge 5 commits intomainfrom
msukkari/optimize-web-build
Closed

perf(web): optimize build performance#944
msukkari wants to merge 5 commits intomainfrom
msukkari/optimize-web-build

Conversation

@msukkari
Copy link
Contributor

@msukkari msukkari commented Feb 26, 2026

Summary

Optimize web package and Docker builds through improved layer caching and Next.js compile-time optimizations:

  • Docker layer caching: Restructure build stages to separate dependency installation from source copying, enabling install layer cache reuse when only source changes
  • Consolidated install: Replace 4 separate yarn workspace install calls with single yarn install --mode=skip-build for faster resolution
  • BuildKit cache mount: Persist .next/cache across Docker builds for incremental Next.js compilation
  • Package imports optimization: Enable experimental.optimizePackageImports for 18 barrel-export packages to reduce compilation time
  • Conditional Sentry features: Only enable expensive Sentry features (widenClientFileUpload, reactComponentAnnotation) when Sentry is configured
  • Parallel builds: Add --parallel flag to root build scripts for concurrent compilation of independent packages

Test plan

  • yarn build:web completes successfully locally
  • docker buildx build --check . validates Dockerfile syntax
  • ✓ Next.js build output shows optimizePackageImports experiments active
  • PR gate CI will validate full Docker build on push

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Optimized Docker build pipeline with richer multi-stage layering for better cache reuse and faster image builds
    • Centralized dependency pre-resolution and consolidated workspace install steps to speed builds
    • Enabled parallel workspace builds and added a workspace-scoped web build entry to reduce total build time
    • Added build-time cache support for web compilation
    • Made certain build-time features conditional based on environment configuration
    • Adjusted runtime image setup for safer non-root execution and volume mounts

…imports

- Separate Docker build stages to copy package.json before source, enabling
  yarn install layer cache reuse on source-only changes
- Consolidate 4 separate yarn install calls into single install with
  --mode=skip-build, avoiding redundant installs in shared-libs-builder
- Add BuildKit cache mount for .next/cache to persist Next.js compilation
  cache across Docker builds
- Enable experimental.optimizePackageImports for 18 barrel-export packages
  (lucide-react, radix-ui, ai-sdk providers, etc.) to reduce compilation time
- Make Sentry widenClientFileUpload and reactComponentAnnotation conditional
  on Sentry credentials being present, removing overhead from OSS builds
- Add --parallel to root build and build:deps scripts for concurrent builds
  of independent packages (db, schemas, queryLanguage)
- Add new build:web script for building only web package and dependencies

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Walkthrough

Reworks Docker multi-stage builds and workspace install/build flow for cacheable, topological dependency resolution; enables parallel workspace builds; always enables Next.js experimental.optimizePackageImports and conditionally gates Sentry-related experimental flags in web config.

Changes

Cohort / File(s) Summary
Docker build & multi-stage flow
Dockerfile
Reworked multi-stage build: expanded manifest copies for topological Yarn resolution, added an immutable yarn install pass (--mode=skip-build), consolidated pre-built shared artifacts copying, added BuildKit cache mount for Next.js build, ensured prisma:generate during installs, tightened ownership/permissions for non-root runtime, and unified copy semantics for web/backend/runner stages.
Workspace build scripts
package.json
Enabled --parallel for the main build and build:deps scripts and added a new build:web script to run the web subset in parallel.
Next.js experimental config
packages/web/next.config.mjs
Always includes experimental.optimizePackageImports (list of packages); gates widenClientFileUpload and reactComponentAnnotation on SENTRY env vars; retains serverActions.allowedOrigins only in development.
Changelog note
CHANGELOG.md
Added an Unreleased changelog entry documenting Docker/web build optimizations, BuildKit cache usage, optimizePackageImports, and conditional Sentry build-time flags.

Sequence Diagram(s)

mermaid
sequenceDiagram
rect rgba(100,149,237,0.5)
participant Dev as Developer/CI
participant Docker as Docker Build
participant BuildKit as BuildKit
end
rect rgba(60,179,113,0.5)
participant Shared as shared-libs-builder
participant Web as web-build
participant Backend as backend-build
participant Runner as final-runner
end

Dev->>Docker: Start multi-stage build
Docker->>Shared: copy manifests, pre-resolve deps (yarn --mode=skip-build)
Shared->>BuildKit: produce pre-built shared artifacts
Docker->>Web: copy manifests + pre-built shared, run yarn (cache mount), build Next.js
Web->>BuildKit: use BuildKit cache for Next.js compilation
Docker->>Backend: copy manifests + shared, run prisma:generate, build backend
Shared->>Runner: copy pre-built artifacts
Web->>Runner: copy web build outputs
Backend->>Runner: copy backend build outputs
Runner->>Dev: final image assembled

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(web): optimize build performance' directly and specifically summarizes the main objective of the pull request, which is to optimize build performance through Docker caching, consolidated installs, BuildKit cache mounts, and parallel builds.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch msukkari/optimize-web-build

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The --mode=skip-build flag skips postinstall hooks including prisma
generate, so the Prisma client types are missing in web-builder and
backend-builder stages. Run prisma:generate explicitly after install.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Dockerfile (1)

172-172: Consider adding BuildKit cache mount for backend build.

The web build uses a cache mount for .next/cache to enable incremental compilation. The backend TypeScript build might also benefit from caching the node_modules/.cache directory or similar, though the impact would be smaller than for Next.js.

♻️ Optional: Add cache mount for backend build
-RUN yarn workspace `@sourcebot/backend` build
+RUN --mount=type=cache,target=/app/node_modules/.cache \
+    yarn workspace `@sourcebot/backend` build
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` at line 172, Add a BuildKit cache mount to the backend build RUN
to reuse TypeScript/yarn caches: update the RUN that executes "yarn workspace
`@sourcebot/backend` build" to use BuildKit's --mount=type=cache flag (e.g. target
a yarn/ts cache like /root/.cache or node_modules/.cache and optionally a yarn
cache path such as /usr/local/share/.cache/yarn) so the build step caches
between runs; ensure the Dockerfile keeps the exact command "yarn workspace
`@sourcebot/backend` build" but prefixed with the appropriate RUN
--mount=type=cache,... syntax and choose cache target paths relevant to your
environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Dockerfile`:
- Line 172: Add a BuildKit cache mount to the backend build RUN to reuse
TypeScript/yarn caches: update the RUN that executes "yarn workspace
`@sourcebot/backend` build" to use BuildKit's --mount=type=cache flag (e.g. target
a yarn/ts cache like /root/.cache or node_modules/.cache and optionally a yarn
cache path such as /usr/local/share/.cache/yarn) so the build step caches
between runs; ensure the Dockerfile keeps the exact command "yarn workspace
`@sourcebot/backend` build" but prefixed with the appropriate RUN
--mount=type=cache,... syntax and choose cache target paths relevant to your
environment.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 214528f and d06ca40.

📒 Files selected for processing (1)
  • Dockerfile

msukkari and others added 2 commits February 25, 2026 18:29
Add --immutable flag to all yarn install commands in Docker for
reproducible builds (prevents lockfile modifications). Combine
sequential RUN commands in the runner stage to reduce final image
layer count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Dockerfile (1)

103-112: Prisma files are currently busting the dependency-install cache.

Because packages/db/prisma is copied before yarn install, schema/migration edits invalidate the expensive install layer. Move Prisma copy + prisma:generate to a later layer in both builders.

♻️ Suggested Docker layering change
 # Step 1: Install dependencies (cached unless package.json/lock changes).
 COPY package.json yarn.lock* .yarnrc.yml ./
 COPY .yarn ./.yarn
 COPY ./packages/web/package.json ./packages/web/package.json
 COPY ./packages/db/package.json ./packages/db/package.json
-COPY ./packages/db/prisma ./packages/db/prisma
 ...
-RUN yarn install --immutable --mode=skip-build && \
-    yarn workspace `@sourcebot/db` prisma:generate
+RUN yarn install --immutable --mode=skip-build
+
+# Step 1b: Prisma client generation (only reruns when prisma files change)
+COPY ./packages/db/prisma ./packages/db/prisma
+RUN yarn workspace `@sourcebot/db` prisma:generate

Apply the same pattern in backend-builder.

Also applies to: 154-162

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 103 - 112, The Dockerfile currently copies
packages/db/prisma (and other Prisma files) before the expensive RUN yarn
install step, which busts the dependency cache; update both builder stages
(including backend-builder) to move the COPY ./packages/db/prisma and any other
Prisma-related COPY lines plus the yarn workspace `@sourcebot/db` prisma:generate
invocation to a later layer after the yarn install --immutable step so
schema/migration edits won't invalidate the install layer; keep application
package.json copies needed for install before RUN yarn install, and only copy
Prisma schemas and run prisma:generate after dependencies are installed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 279-286: Replace the broad git safe.directory wildcard with
explicit trusted paths: change the git config invocation that sets
safe.directory "*" so it instead registers only the specific repo and index
directories (e.g. /data/.sourcebot/repos and /data/.sourcebot/index derived from
DATA_CACHE_DIR) using separate git config --global --add safe.directory <path>
entries; ensure any documentation or startup notes mention ownership
requirements for file:// mounted repos so per-repository configs can be used
when appropriate.

---

Nitpick comments:
In `@Dockerfile`:
- Around line 103-112: The Dockerfile currently copies packages/db/prisma (and
other Prisma files) before the expensive RUN yarn install step, which busts the
dependency cache; update both builder stages (including backend-builder) to move
the COPY ./packages/db/prisma and any other Prisma-related COPY lines plus the
yarn workspace `@sourcebot/db` prisma:generate invocation to a later layer after
the yarn install --immutable step so schema/migration edits won't invalidate the
install layer; keep application package.json copies needed for install before
RUN yarn install, and only copy Prisma schemas and run prisma:generate after
dependencies are installed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2f9afd and 56f2b45.

📒 Files selected for processing (1)
  • Dockerfile

Comment on lines +279 to +286
RUN git config --global safe.directory "*" && \
mkdir -p /run/postgresql && \
chown -R postgres:postgres /run/postgresql && \
chmod 775 /run/postgresql

# Make app directory accessible to both root and sourcebot user
RUN chown -R sourcebot /app \
&& chgrp -R 0 /app \
&& chmod -R g=u /app
# Make data directory accessible to both root and sourcebot user
RUN chown -R sourcebot /data
chmod 775 /run/postgresql && \
chown -R sourcebot /app && \
chgrp -R 0 /app && \
chmod -R g=u /app && \
chown -R sourcebot /data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# First, find relevant files
echo "=== Finding relevant files ==="
fd -t f -E node_modules -E .git 'Dockerfile|entrypoint|supervisord|prefix-output' . 2>/dev/null | head -20 || true

echo -e "\n=== Searching for safe.directory and git invocations ==="
rg -n 'safe\.directory|git\s+(config|clone|pull|fetch)' --max-count=20 || true

echo -e "\n=== Searching for user context changes (RUN USER, su-exec, gosu, sudo) ==="
rg -n '^\s*USER\b|su-exec|gosu|sudo\s+' --max-count=20 || true

echo -e "\n=== Checking Dockerfile structure around line 279 ==="
fd -t f Dockerfile | head -1 | xargs -I {} sed -n '270,295p' {} || true

Repository: sourcebot-dev/sourcebot

Length of output: 6516


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== entrypoint.sh content ==="
cat -n ./entrypoint.sh || true

echo -e "\n=== supervisord.conf relevant sections ==="
rg -n 'user|command|git' supervisord.conf -A 3 || true

echo -e "\n=== git.ts to understand git operation directories ==="
head -100 ./packages/backend/src/git.ts || true

echo -e "\n=== Find all git clone/fetch operations in code ==="
rg -n 'git.*clone|git.*fetch' packages/ -A 2 -B 2 | head -60 || true

Repository: sourcebot-dev/sourcebot

Length of output: 20108


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full supervisord.conf ==="
cat supervisord.conf || true

echo -e "\n=== Check if user is specified in supervisord.conf ==="
rg -n '^\s*(user|group)\s*=' supervisord.conf || echo "No user/group found"

echo -e "\n=== Check git invocations - which directories are accessed ==="
rg -n 'cloneRepository|createGitClientForPath|git.*cwd' packages/backend/src/ -B 3 -A 3 | head -80 || true

echo -e "\n=== Check what DATA_CACHE_DIR is set to ==="
rg -n 'DATA_CACHE_DIR' entrypoint.sh | head -10 || true

echo -e "\n=== Find where repositories are stored ==="
rg -n 'repoPath\s*=' packages/backend/src/ -A 1 | head -30 || true

Repository: sourcebot-dev/sourcebot

Length of output: 8589


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Where is repoPath defined/constructed ==="
rg -n 'repoPath\s*=' packages/backend/src/repoIndexManager.ts -B 3 -A 1 | head -50 || true

echo -e "\n=== Check INDEX_CACHE_DIR constant ==="
rg -n 'INDEX_CACHE_DIR' packages/backend/src/constants.ts -A 2 -B 2 || true

echo -e "\n=== Check if /data is referenced anywhere ==="
rg -n '/data|DATA_CACHE_DIR' packages/backend/src/ --type ts | head -20 || true

echo -e "\n=== Search for default values of DATA_CACHE_DIR ==="
rg -n 'DATA_CACHE_DIR|INDEX_CACHE_DIR' . --type sh --type ts | grep -E '\s?=|default|export' | head -15 || true

Repository: sourcebot-dev/sourcebot

Length of output: 2122


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for DATA_CACHE_DIR default value in env setup ==="
rg -n 'DATA_CACHE_DIR' . --type ts --type js -B 2 -A 2 | grep -E 'default|DATA_CACHE_DIR\s*=' | head -20 || true

echo -e "\n=== Check .env or environment files ==="
fd -t f '.env*|environment' . | head -10 || true

echo -e "\n=== Check Dockerfile for DATA_CACHE_DIR or /data setup ==="
sed -n '260,310p' Dockerfile || true

echo -e "\n=== Look for env variable declarations ==="
rg -n 'DATA_CACHE_DIR|/data|/app' Dockerfile | head -20 || true

echo -e "\n=== Check shared env setup ==="
rg -n 'DATA_CACHE_DIR' packages/shared/ -A 2 -B 2 || true

Repository: sourcebot-dev/sourcebot

Length of output: 4319


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Confirm git operations are only in /app and /data/.sourcebot paths ==="
echo "Git clone/fetch operations in code:"
rg -n 'cloneUrl|repoPath|createGitClientForPath' packages/backend/src/git.ts packages/backend/src/repoIndexManager.ts | head -40 || true

echo -e "\n=== Check if git is invoked anywhere else at runtime ==="
rg -n 'git\s+(clone|fetch|config)' packages/ --type ts --type js | grep -v node_modules | grep -v dist | head -30 || true

echo -e "\n=== Verify safe.directory is only set once (global scope) ==="
rg -n 'safe\.directory' . --type ts --type js --type sh | grep -v node_modules || true

echo -e "\n=== Check if there's documentation on the volume mounting issue ==="
rg -n 'dubious ownership|safe\.directory|volume' docs/ || true

Repository: sourcebot-dev/sourcebot

Length of output: 8435


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Verify repoPath construction - where exactly are repos stored ==="
rg -n 'getRepoPath' packages/backend/src/ -A 5 -B 1 | head -50 || true

echo -e "\n=== Check if git might run against /app or other locations ==="
rg -n 'cwd|path.*=|working.*dir' packages/backend/src/git.ts | head -20 || true

echo -e "\n=== Confirm all git.* functions use createGitClientForPath ==="
rg -n 'export.*const.*\(' packages/backend/src/git.ts | head -20 || true

Repository: sourcebot-dev/sourcebot

Length of output: 5486


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Find getRepoPath implementation ==="
rg -n 'export.*getRepoPath|function getRepoPath' packages/shared/src/ -A 15 || true

echo -e "\n=== Check if getRepoPath can return paths outside /data ==="
rg -n 'getRepoPath' packages/shared/src/utils.ts -A 20 -B 5 || true

Repository: sourcebot-dev/sourcebot

Length of output: 2155


Consider narrowing safe.directory to specific paths rather than wildcard.

The global wildcard disables Git's ownership safety across the entire filesystem. Instead, explicitly trust only the paths where Git actually operates: /data/.sourcebot/repos and /data/.sourcebot/index (derived from DATA_CACHE_DIR). For local repositories mounted via file:// URLs, consider using per-repository configuration or documenting the ownership requirements for mounted volumes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 279 - 286, Replace the broad git safe.directory
wildcard with explicit trusted paths: change the git config invocation that sets
safe.directory "*" so it instead registers only the specific repo and index
directories (e.g. /data/.sourcebot/repos and /data/.sourcebot/index derived from
DATA_CACHE_DIR) using separate git config --global --add safe.directory <path>
entries; ensure any documentation or startup notes mention ownership
requirements for file:// mounted repos so per-repository configs can be used
when appropriate.

@msukkari msukkari closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant