🤖 feat: Add PostHog experiments integration #1179

ThomasK33 · 2025-12-15T16:23:10Z

Add backend-first PostHog feature flag evaluation for remote-controlled experiments, starting with Post-Compaction Context.

Changes

Backend (ExperimentsService)

Evaluate PostHog feature flags via posthog-node
Disk cache (~/.mux/feature_flags.json) with TTL-based refresh
Fail-closed behavior (unknown = disabled)
Disable calls when telemetry is off

Telemetry enrichment (TelemetryService)

setFeatureFlagVariant() adds $feature/<flagKey> to all events
Enables variant breakdown in PostHog analytics

oRPC layer

experiments.getAll: Get all experiment values
experiments.reload: Force refresh from PostHog

Frontend (ExperimentsContext)

Fetch remote experiments on mount
Priority: remote PostHog > local toggle > default
Read-only UI when experiment is remote-controlled

Backend authoritative gating (WorkspaceService)

sendMessage() resolves experiment from PostHog when enabled
list() decides includePostCompaction based on experiment

Type consolidation

ExperimentValueSchema (Zod) is single source of truth
ExperimentValue type derived via z.infer in types.ts

Bug fixes (unrelated)

Fixed backgroundProcessManager exit race condition
Fixed telemetry client Node.js compatibility
Relaxed timing test threshold in authMiddleware

📋 Implementation Plan

PostHog early access, feature flags, and experiments (Mux)

Goals

Run remote-controlled experiments/feature flags from PostHog (starting with Post-Compaction Context).
Keep PostHog interactions backend-first to avoid ad-blocker issues (Mux already forwards telemetry to Node via oRPC).
Ensure telemetry/error events can be analyzed by variant in PostHog (required for experiments + server-side event capture).
Preserve Mux privacy guarantees: no project names, file paths, user prompts, etc. sent to PostHog.

Recommendation (architecture)

✅ Approach A (recommended): Backend-owned flag/experiment evaluation + oRPC exposure

Net new product LoC (est.): ~250–450

Use the existing main-process posthog-node client to:
- evaluate flags/experiment variants
- emit $feature_flag_called exposure events
- attach $feature/<flagKey> properties to telemetry events so PostHog can break down metrics by variant
Expose a small oRPC surface to the renderer for:
- displaying current experiment variant in Settings → Experiments
- gating UI behavior where needed
Keep local experiment toggles only as a dev/override fallback (optional), not the default source of truth.

Why this fits Mux:

Mux already routes telemetry through the backend “to avoid ad-blocker issues.”
The Post-Compaction Context experiment gates backend behavior (attachment injection), so backend must know the assignment anyway.

Alternative B: Renderer uses `posthog-js` for flags/experiments (keep backend telemetry)

Net new product LoC (est.): ~350–650

Pros:

Easiest way to adopt Early Access Feature Management (PostHog notes it’s JS-web-SDK-only).

Cons:

Reintroduces the ad-blocker/network fragility we intentionally avoided for telemetry.
Requires careful identity bridging (distinctId must match backend’s ID or you lose experiment attribution).

Proposed flow (Approach A)

flowchart TD
  A[Renderer] -->|oRPC: telemetry.track| B[Main process TelemetryService]
  A -->|oRPC: experiments.getAll| C[Main process ExperimentsService]

  C -->|getFeatureFlag: post-compaction-context| D[PostHog Decide / Flags]
  C -->|cache variant| C

  B -->|capture events| E[PostHog events]
  B -->|include $feature/post-compaction-context| E

Implementation plan

1) Backend: add a PostHog-backed experiments/flags service

Create src/node/services/experimentsService.ts (name TBD) that depends on:
- TelemetryService (for distinctId + access to a PostHog client), or
- a shared PostHogClientService if you want to refactor TelemetryService into a reusable PostHog wrapper.

Core responsibilities:

getDistinctId() (or expose from TelemetryService) – single stable identity used for both:
- flag evaluation
- telemetry capture
getExperimentVariant(experimentId: ExperimentId): Promise<string | boolean | null>
- Map ExperimentId → PostHog feature flag key. (Conveniently, current EXPERIMENT_IDS.* already look like flag keys.)
- Call posthog.getFeatureFlag(flagKey, distinctId).
  - This automatically emits $feature_flag_called (exposure) events when appropriate.
- Cache result in-memory with a TTL (e.g., 5–15 min) to avoid re-fetching on every UI render.
isExperimentEnabled(experimentId: ExperimentId): boolean
- Converts the raw PostHog variant into a boolean gate.
- Suggested mapping for post-compaction-context:
  - "test" / true → enabled
  - "control" / false / null → disabled

Offline + startup behavior:

Persist last-known assignments to disk in ~/.mux/feature_flags.json (or inside muxHome near telemetry_id).
On startup:
- load cached values immediately (fast)
- refresh asynchronously in the background
Fail closed: if PostHog is unreachable, default experiment to control (disabled) unless cached value exists.

Feature-flag enablement rules:

If telemetry is disabled (MUX_DISABLE_TELEMETRY=1, CI, test, etc.), do not call PostHog for flags.
- Return null / “unknown” from the service.
- Renderer can fall back to local toggles (dev-only) or treat as control.
To test PostHog-driven experiments in an unpackaged/dev Electron build, use the existing env opt-in:
- MUX_ENABLE_TELEMETRY_IN_DEV=1

2) Backend: attach experiment/flag info to telemetry events

PostHog’s docs explicitly require this for server-side capture.

Implement one of these (recommend #1):

Manual property injection (preferred):
- Add $feature/<flagKey> properties to captured events.
- Implementation idea: TelemetryService.getBaseProperties() merges in a stable this.featureFlagProperties map populated by ExperimentsService.
- For the initial experiment:
  - $feature/post-compaction-context: 'control' | 'test' (or boolean) depending on how you configure variants.
sendFeatureFlags: true on posthog.capture()
- Avoid unless you also enable local evaluation, otherwise it can add extra requests per capture.

Also add:

A tiny “experiment snapshot” helper so that error_occurred (and other critical events) always include the variant, even if flags aren’t fully loaded yet (use cached value).

3) oRPC: expose experiment state to the renderer

Add a new oRPC namespace, e.g. experiments:

experiments.getAll → returns Record<ExperimentId, { value: string | boolean | null; source: 'posthog' | 'cache' | 'disabled' }>
experiments.reload (optional) → forces a refresh, useful for debugging the Settings page.

Update:

src/common/orpc/schemas/api.ts to include the new endpoints.
src/node/orpc/router.ts to wire handlers.
src/node/orpc/context.ts + ServiceContainer to register the new service.

4) Frontend: update ExperimentsContext + Settings → Experiments

Target behavior:

In packaged builds (telemetry enabled): show experiments as read-only (variant + short description), since assignment is remote-controlled.
In dev/test or when PostHog flags are disabled/unavailable: fall back to the existing local toggles.

Concrete steps:

Add a new hook (or extend existing):
- useRemoteExperiments() → fetches experiments.getAll once and stores in context.
Update useExperimentValue(EXPERIMENT_IDS.POST_COMPACTION_CONTEXT) to resolve in this order:
1. Remote PostHog assignment (if available)
2. Cached remote assignment (if remote temporarily unavailable)
3. LocalStorage toggle (dev fallback)
4. Default (enabledByDefault)

UI changes (ExperimentsSection.tsx):

If remote assignment exists:
- render a disabled Switch (or replace with a badge) and show Variant: control/test.
Else:
- keep the current toggle UI.

5) Wire `Post-Compaction Context` gating to PostHog

Backend gating (authoritative):

In AgentSession (or WorkspaceService.sendMessage), compute:
- postCompactionContextEnabled = experimentsService.isEnabled('post-compaction-context')
Use this instead of (or in preference to) options?.experiments?.postCompactionContext.

Frontend gating (UI):

Use the same resolved experiment value to:
- decide whether to request includePostCompaction in workspace.list
- show/hide the PostCompaction UI (Costs tab/sidebar)

Recommended (practical) simplification:

Change workspace.list({ includePostCompaction }) so that when includePostCompaction is omitted, the backend decides based on experiment state.
- This is likely necessary because WorkspaceProvider loads metadata before ExperimentsProvider mounts today.
- It removes a “front-end must know experiment first” dependency and avoids provider-tree churn.

6) Add minimal analytics events for the experiment (optional but high-value)

To get actionable insights beyond “did users click it,” add 1–2 low-cardinality events:

compaction_performed
- properties: had_file_diffs: boolean, diff_count_b2: number
post_compaction_context_injected
- properties: plan_included: boolean, diff_count_b2: number

All properties must remain privacy-safe (counts + booleans only).

7) Tests

Unit tests for ExperimentsService:
- caching + TTL
- disabled-by-env behavior
- disk cache load/save
Unit test for TelemetryService:
- includes $feature/post-compaction-context when cached/available
Update existing post-compaction tests if behavior changes from “frontend-provided flag” → “backend-derived flag.”

PostHog provisioning (via MCP) ✅

Since you’ve configured the PostHog MCP server, we can create the flag + experiment as part of this integration (in Exec mode) rather than doing it manually in the PostHog UI.

Select the target PostHog project
- posthog_organization-details-get (confirm current org)
- posthog_projects-get (pick projectId)
- posthog_switch-project({ projectId }) (if needed)
Create (or reuse) the feature flag post-compaction-context
- Check for existence:
  - posthog_feature-flag-get-all (search for key)
  - or posthog_feature-flag-get-definition({ flagKey: 'post-compaction-context' })
- If missing:
  - Prefer letting posthog_experiment-create create/update the underlying flag (since experiments want explicit variants).
  - Fallback: posthog_create-feature-flag(...) (boolean-only) and then upgrade to variants via the experiment.
Create (or reuse) the experiment “Post-Compaction Context”
- Check existing experiments:
  - posthog_experiment-get-all (avoid duplicates for the same feature_flag_key)
- (Optional) sanity-check event names we’ll use for metrics:
  - posthog_event-definitions-list (look for error_occurred, stream_completed, message_sent, etc.)
- Create as draft first:
  - posthog_experiment-create({ feature_flag_key: 'post-compaction-context', variants: [{ key: 'control', rollout_percentage: 50 }, { key: 'test', rollout_percentage: 50 }], ... })
  - Suggested primary metric: mean error_occurred
  - Suggested secondary metrics: mean stream_completed, mean message_sent
  - If we implement the optional new events in step 6, add post_compaction_context_injected as a secondary metric (sanity-check feature usage).
Launch / stop the experiment
- Launch after code ships: posthog_experiment-update({ experimentId, data: { launch: true } })
- Stop/conclude: posthog_experiment-update({ experimentId, data: { conclude: 'won' | 'lost' | 'inconclusive' | 'stopped_early' | 'invalid', conclusion_comment } })

Manual fallback (if MCP is unavailable)

Create a feature flag key: post-compaction-context.
Create an experiment using that flag key (variants: control vs test).
Choose at least one metric (e.g., error_occurred, stream_completed, or post_compaction_context_injected if implemented).

Notes on Early Access Feature Management

PostHog’s docs state Early Access management is currently only available in the JavaScript Web SDK.

If we want “users opt into betas” inside Mux Settings:

Either adopt posthog-js in the renderer specifically for early access APIs, OR
Implement early-access enrollment via PostHog APIs (will require auth + careful security), OR
Keep Mux’s current local toggle approach for “labs” features and reserve PostHog for experiments.

Given your immediate goal is AB testing Post-Compaction Context, I’d start with backend feature flags/experiments first.

Generated with mux • Model: anthropic:claude-opus-4-5 • Thinking: high

chatgpt-codex-connector · 2025-12-15T16:23:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

ThomasK33 · 2025-12-16T19:34:44Z

@codex review

ThomasK33 · 2025-12-16T20:28:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/browser/contexts/ExperimentsContext.tsx

ThomasK33 · 2025-12-17T07:53:47Z

@codex review

chatgpt-codex-connector · 2025-12-17T08:06:21Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Add backend-first PostHog feature flag evaluation for remote-controlled experiments, starting with Post-Compaction Context. Backend (ExperimentsService): - Evaluate PostHog feature flags via posthog-node - Disk cache (~/.mux/feature_flags.json) with TTL-based refresh - Fail-closed behavior (unknown = disabled) - Disable calls when telemetry is off Telemetry enrichment (TelemetryService): - setFeatureFlagVariant() adds $feature/<flagKey> to all events - Enables variant breakdown in PostHog analytics oRPC layer: - experiments.getAll: Get all experiment values - experiments.reload: Force refresh from PostHog Frontend (ExperimentsContext): - Fetch remote experiments on mount - Priority: remote PostHog > local toggle > default - Read-only UI when experiment is remote-controlled Backend authoritative gating (WorkspaceService): - sendMessage() resolves experiment from PostHog when enabled - list() decides includePostCompaction based on experiment Type consolidation: - ExperimentValueSchema (Zod) is single source of truth - ExperimentValue type derived via z.infer in types.ts Bug fixes (unrelated): - Fixed backgroundProcessManager exit race condition - Fixed telemetry client Node.js compatibility - Relaxed timing test threshold in authMiddleware Change-Id: I346c924324a5f59cb3349614382dc8a5276e5e1e Signed-off-by: Thomas Kosiewski <[email protected]>

Allow per-experiment control over whether users can override remote PostHog assignments and whether experiments appear in Settings. ExperimentDefinition changes: - userOverridable?: boolean - when true, local toggle takes precedence - showInSettings?: boolean - when false, hide from Settings UI Resolution priority (when userOverridable=true): 1. Local localStorage toggle (user's explicit choice) 2. Remote PostHog assignment 3. Default (enabledByDefault) Implementation: - experiments.ts: Add new optional fields, set POST_COMPACTION_CONTEXT as userOverridable=true - ExperimentsContext: hasLocalOverride() helper, updated useExperimentValue() and useAllExperiments() to respect userOverridable - ExperimentsSection: Filter by showInSettings, enable toggle when canOverride - WorkspaceService: Respect userOverridable in both list() and sendMessage() Change-Id: I3afc8514c74151b8b72991aa13ab98296cfd19bb Signed-off-by: Thomas Kosiewski <[email protected]>

- Makefile: MUX_ENABLE_TELEMETRY_IN_DEV=1 now sufficient (no need to also set MUX_DISABLE_TELEMETRY=0) - ExperimentsSection: hide non-overridable experiments, remove PostHog info line - Add experiment_overridden telemetry event with experimentId, assignedVariant, userChoice - Update oRPC schema, payload types, tracking functions, and useTelemetry hook Signed-off-by: Thomas Kosiewski <[email protected]> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking: `high`_ Change-Id: I3582117b82c1025bcfd94d1361bb11c46cb8ff9e

Previously, getSendOptionsFromStorage() (used by resume/creation flows) always passed the localStorage default to the backend, which treated any non-undefined value as an explicit user override for userOverridable experiments. Fix: isExperimentEnabled() now returns undefined for userOverridable experiments when user hasn't explicitly set a localStorage value. The backend already handles undefined correctly by falling through to PostHog assignment. Also addresses code review feedback: - Move telemetryService property declaration to top of ExperimentsService class - Add comment explaining the "undefined" string check in hasLocalOverride - Add docstring for getRemoteExperimentEnabled and move near other helpers - Document MUX_ENABLE_TELEMETRY_IN_DEV env var in Makefile Change-Id: I15e5360f7461cd62ad347fbb2e32b0e8dc4b873d Signed-off-by: Thomas Kosiewski <[email protected]>

- useSendMessageOptions now passes undefined unless user explicitly overrides - add useExperimentOverrideValue() helper - harden localStorage parsing + add tests Change-Id: I33d9f1c8d3bba4f083132c4f645c76328327726a Signed-off-by: Thomas Kosiewski <[email protected]>

ExperimentsService can return { source: 'cache', value: null } on first launch while PostHog refresh runs in the background. The renderer previously fetched experiments.getAll only once, so remote variants never became visible until manual reload. Fix: - Poll experiments.getAll with bounded backoff while any values are pending - Add a regression test for ExperimentsProvider Change-Id: If9533ee2ad0430729600275aedcf9b1939ec612d Signed-off-by: Thomas Kosiewski <[email protected]>

ThomasK33 force-pushed the posthog-feature-flags branch 7 times, most recently from 31b073d to b9f4947 Compare December 16, 2025 18:03

This comment was marked as resolved.

Sign in to view

ThomasK33 force-pushed the posthog-feature-flags branch from 1dd8596 to eacd94f Compare December 16, 2025 20:31

chatgpt-codex-connector bot reviewed Dec 16, 2025

View reviewed changes

src/browser/contexts/ExperimentsContext.tsx Show resolved Hide resolved

ThomasK33 added 6 commits December 17, 2025 09:30

ThomasK33 force-pushed the posthog-feature-flags branch from 0c6b512 to e171a6e Compare December 17, 2025 08:34

ThomasK33 added this pull request to the merge queue Dec 17, 2025

Merged via the queue into main with commit cc478e5 Dec 17, 2025
20 checks passed

ThomasK33 deleted the posthog-feature-flags branch December 17, 2025 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 feat: Add PostHog experiments integration #1179

🤖 feat: Add PostHog experiments integration #1179

Uh oh!

ThomasK33 commented Dec 15, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 15, 2025

Uh oh!

ThomasK33 commented Dec 16, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

ThomasK33 commented Dec 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ThomasK33 commented Dec 17, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 feat: Add PostHog experiments integration #1179

🤖 feat: Add PostHog experiments integration #1179

Uh oh!

Conversation

ThomasK33 commented Dec 15, 2025

Changes

Backend (ExperimentsService)

Telemetry enrichment (TelemetryService)

oRPC layer

Frontend (ExperimentsContext)

Backend authoritative gating (WorkspaceService)

Type consolidation

Bug fixes (unrelated)

PostHog early access, feature flags, and experiments (Mux)

Goals

Recommendation (architecture)

✅ Approach A (recommended): Backend-owned flag/experiment evaluation + oRPC exposure

Alternative B: Renderer uses posthog-js for flags/experiments (keep backend telemetry)

Proposed flow (Approach A)

Implementation plan

1) Backend: add a PostHog-backed experiments/flags service

2) Backend: attach experiment/flag info to telemetry events

3) oRPC: expose experiment state to the renderer

4) Frontend: update ExperimentsContext + Settings → Experiments

5) Wire Post-Compaction Context gating to PostHog

6) Add minimal analytics events for the experiment (optional but high-value)

7) Tests

PostHog provisioning (via MCP) ✅

Manual fallback (if MCP is unavailable)

Uh oh!

chatgpt-codex-connector bot commented Dec 15, 2025

Uh oh!

ThomasK33 commented Dec 16, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

ThomasK33 commented Dec 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Dec 17, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alternative B: Renderer uses `posthog-js` for flags/experiments (keep backend telemetry)

5) Wire `Post-Compaction Context` gating to PostHog