Skip to content

Conversation

@ThomasK33
Copy link
Member

Add backend-first PostHog feature flag evaluation for remote-controlled experiments, starting with Post-Compaction Context.

Changes

Backend (ExperimentsService)

  • Evaluate PostHog feature flags via posthog-node
  • Disk cache (~/.mux/feature_flags.json) with TTL-based refresh
  • Fail-closed behavior (unknown = disabled)
  • Disable calls when telemetry is off

Telemetry enrichment (TelemetryService)

  • setFeatureFlagVariant() adds $feature/<flagKey> to all events
  • Enables variant breakdown in PostHog analytics

oRPC layer

  • experiments.getAll: Get all experiment values
  • experiments.reload: Force refresh from PostHog

Frontend (ExperimentsContext)

  • Fetch remote experiments on mount
  • Priority: remote PostHog > local toggle > default
  • Read-only UI when experiment is remote-controlled

Backend authoritative gating (WorkspaceService)

  • sendMessage() resolves experiment from PostHog when enabled
  • list() decides includePostCompaction based on experiment

Type consolidation

  • ExperimentValueSchema (Zod) is single source of truth
  • ExperimentValue type derived via z.infer in types.ts

Bug fixes (unrelated)

  • Fixed backgroundProcessManager exit race condition
  • Fixed telemetry client Node.js compatibility
  • Relaxed timing test threshold in authMiddleware

📋 Implementation Plan

PostHog early access, feature flags, and experiments (Mux)

Goals

  1. Run remote-controlled experiments/feature flags from PostHog (starting with Post-Compaction Context).
  2. Keep PostHog interactions backend-first to avoid ad-blocker issues (Mux already forwards telemetry to Node via oRPC).
  3. Ensure telemetry/error events can be analyzed by variant in PostHog (required for experiments + server-side event capture).
  4. Preserve Mux privacy guarantees: no project names, file paths, user prompts, etc. sent to PostHog.

Recommendation (architecture)

✅ Approach A (recommended): Backend-owned flag/experiment evaluation + oRPC exposure

Net new product LoC (est.): ~250–450

  • Use the existing main-process posthog-node client to:
    • evaluate flags/experiment variants
    • emit $feature_flag_called exposure events
    • attach $feature/<flagKey> properties to telemetry events so PostHog can break down metrics by variant
  • Expose a small oRPC surface to the renderer for:
    • displaying current experiment variant in Settings → Experiments
    • gating UI behavior where needed
  • Keep local experiment toggles only as a dev/override fallback (optional), not the default source of truth.

Why this fits Mux:

  • Mux already routes telemetry through the backend “to avoid ad-blocker issues.”
  • The Post-Compaction Context experiment gates backend behavior (attachment injection), so backend must know the assignment anyway.

Alternative B: Renderer uses posthog-js for flags/experiments (keep backend telemetry)

Net new product LoC (est.): ~350–650

Pros:

  • Easiest way to adopt Early Access Feature Management (PostHog notes it’s JS-web-SDK-only).

Cons:

  • Reintroduces the ad-blocker/network fragility we intentionally avoided for telemetry.
  • Requires careful identity bridging (distinctId must match backend’s ID or you lose experiment attribution).

Proposed flow (Approach A)

flowchart TD
  A[Renderer] -->|oRPC: telemetry.track| B[Main process TelemetryService]
  A -->|oRPC: experiments.getAll| C[Main process ExperimentsService]

  C -->|getFeatureFlag: post-compaction-context| D[PostHog Decide / Flags]
  C -->|cache variant| C

  B -->|capture events| E[PostHog events]
  B -->|include $feature/post-compaction-context| E
Loading

Implementation plan

1) Backend: add a PostHog-backed experiments/flags service

  • Create src/node/services/experimentsService.ts (name TBD) that depends on:
    • TelemetryService (for distinctId + access to a PostHog client), or
    • a shared PostHogClientService if you want to refactor TelemetryService into a reusable PostHog wrapper.

Core responsibilities:

  • getDistinctId() (or expose from TelemetryService) – single stable identity used for both:
    • flag evaluation
    • telemetry capture
  • getExperimentVariant(experimentId: ExperimentId): Promise<string | boolean | null>
    • Map ExperimentId → PostHog feature flag key. (Conveniently, current EXPERIMENT_IDS.* already look like flag keys.)
    • Call posthog.getFeatureFlag(flagKey, distinctId).
      • This automatically emits $feature_flag_called (exposure) events when appropriate.
    • Cache result in-memory with a TTL (e.g., 5–15 min) to avoid re-fetching on every UI render.
  • isExperimentEnabled(experimentId: ExperimentId): boolean
    • Converts the raw PostHog variant into a boolean gate.
    • Suggested mapping for post-compaction-context:
      • "test" / true → enabled
      • "control" / false / null → disabled

Offline + startup behavior:

  • Persist last-known assignments to disk in ~/.mux/feature_flags.json (or inside muxHome near telemetry_id).
  • On startup:
    • load cached values immediately (fast)
    • refresh asynchronously in the background
  • Fail closed: if PostHog is unreachable, default experiment to control (disabled) unless cached value exists.

Feature-flag enablement rules:

  • If telemetry is disabled (MUX_DISABLE_TELEMETRY=1, CI, test, etc.), do not call PostHog for flags.
    • Return null / “unknown” from the service.
    • Renderer can fall back to local toggles (dev-only) or treat as control.
  • To test PostHog-driven experiments in an unpackaged/dev Electron build, use the existing env opt-in:
    • MUX_ENABLE_TELEMETRY_IN_DEV=1

2) Backend: attach experiment/flag info to telemetry events

PostHog’s docs explicitly require this for server-side capture.

Implement one of these (recommend #1):

  1. Manual property injection (preferred):

    • Add $feature/<flagKey> properties to captured events.
    • Implementation idea: TelemetryService.getBaseProperties() merges in a stable this.featureFlagProperties map populated by ExperimentsService.
    • For the initial experiment:
      • $feature/post-compaction-context: 'control' | 'test' (or boolean) depending on how you configure variants.
  2. sendFeatureFlags: true on posthog.capture()

    • Avoid unless you also enable local evaluation, otherwise it can add extra requests per capture.

Also add:

  • A tiny “experiment snapshot” helper so that error_occurred (and other critical events) always include the variant, even if flags aren’t fully loaded yet (use cached value).

3) oRPC: expose experiment state to the renderer

Add a new oRPC namespace, e.g. experiments:

  • experiments.getAll → returns Record<ExperimentId, { value: string | boolean | null; source: 'posthog' | 'cache' | 'disabled' }>
  • experiments.reload (optional) → forces a refresh, useful for debugging the Settings page.

Update:

  • src/common/orpc/schemas/api.ts to include the new endpoints.
  • src/node/orpc/router.ts to wire handlers.
  • src/node/orpc/context.ts + ServiceContainer to register the new service.

4) Frontend: update ExperimentsContext + Settings → Experiments

Target behavior:

  • In packaged builds (telemetry enabled): show experiments as read-only (variant + short description), since assignment is remote-controlled.
  • In dev/test or when PostHog flags are disabled/unavailable: fall back to the existing local toggles.

Concrete steps:

  • Add a new hook (or extend existing):
    • useRemoteExperiments() → fetches experiments.getAll once and stores in context.
  • Update useExperimentValue(EXPERIMENT_IDS.POST_COMPACTION_CONTEXT) to resolve in this order:
    1. Remote PostHog assignment (if available)
    2. Cached remote assignment (if remote temporarily unavailable)
    3. LocalStorage toggle (dev fallback)
    4. Default (enabledByDefault)

UI changes (ExperimentsSection.tsx):

  • If remote assignment exists:
    • render a disabled Switch (or replace with a badge) and show Variant: control/test.
  • Else:
    • keep the current toggle UI.

5) Wire Post-Compaction Context gating to PostHog

Backend gating (authoritative):

  • In AgentSession (or WorkspaceService.sendMessage), compute:
    • postCompactionContextEnabled = experimentsService.isEnabled('post-compaction-context')
  • Use this instead of (or in preference to) options?.experiments?.postCompactionContext.

Frontend gating (UI):

  • Use the same resolved experiment value to:
    • decide whether to request includePostCompaction in workspace.list
    • show/hide the PostCompaction UI (Costs tab/sidebar)

Recommended (practical) simplification:

  • Change workspace.list({ includePostCompaction }) so that when includePostCompaction is omitted, the backend decides based on experiment state.
    • This is likely necessary because WorkspaceProvider loads metadata before ExperimentsProvider mounts today.
    • It removes a “front-end must know experiment first” dependency and avoids provider-tree churn.

6) Add minimal analytics events for the experiment (optional but high-value)

To get actionable insights beyond “did users click it,” add 1–2 low-cardinality events:

  • compaction_performed
    • properties: had_file_diffs: boolean, diff_count_b2: number
  • post_compaction_context_injected
    • properties: plan_included: boolean, diff_count_b2: number

All properties must remain privacy-safe (counts + booleans only).

7) Tests

  • Unit tests for ExperimentsService:
    • caching + TTL
    • disabled-by-env behavior
    • disk cache load/save
  • Unit test for TelemetryService:
    • includes $feature/post-compaction-context when cached/available
  • Update existing post-compaction tests if behavior changes from “frontend-provided flag” → “backend-derived flag.”

PostHog provisioning (via MCP) ✅

Since you’ve configured the PostHog MCP server, we can create the flag + experiment as part of this integration (in Exec mode) rather than doing it manually in the PostHog UI.

  1. Select the target PostHog project

    • posthog_organization-details-get (confirm current org)
    • posthog_projects-get (pick projectId)
    • posthog_switch-project({ projectId }) (if needed)
  2. Create (or reuse) the feature flag post-compaction-context

    • Check for existence:
      • posthog_feature-flag-get-all (search for key)
      • or posthog_feature-flag-get-definition({ flagKey: 'post-compaction-context' })
    • If missing:
      • Prefer letting posthog_experiment-create create/update the underlying flag (since experiments want explicit variants).
      • Fallback: posthog_create-feature-flag(...) (boolean-only) and then upgrade to variants via the experiment.
  3. Create (or reuse) the experiment “Post-Compaction Context”

    • Check existing experiments:
      • posthog_experiment-get-all (avoid duplicates for the same feature_flag_key)
    • (Optional) sanity-check event names we’ll use for metrics:
      • posthog_event-definitions-list (look for error_occurred, stream_completed, message_sent, etc.)
    • Create as draft first:
      • posthog_experiment-create({ feature_flag_key: 'post-compaction-context', variants: [{ key: 'control', rollout_percentage: 50 }, { key: 'test', rollout_percentage: 50 }], ... })
      • Suggested primary metric: mean error_occurred
      • Suggested secondary metrics: mean stream_completed, mean message_sent
      • If we implement the optional new events in step 6, add post_compaction_context_injected as a secondary metric (sanity-check feature usage).
  4. Launch / stop the experiment

    • Launch after code ships: posthog_experiment-update({ experimentId, data: { launch: true } })
    • Stop/conclude: posthog_experiment-update({ experimentId, data: { conclude: 'won' | 'lost' | 'inconclusive' | 'stopped_early' | 'invalid', conclusion_comment } })

Manual fallback (if MCP is unavailable)

  • Create a feature flag key: post-compaction-context.
  • Create an experiment using that flag key (variants: control vs test).
  • Choose at least one metric (e.g., error_occurred, stream_completed, or post_compaction_context_injected if implemented).
Notes on Early Access Feature Management

PostHog’s docs state Early Access management is currently only available in the JavaScript Web SDK.

If we want “users opt into betas” inside Mux Settings:

  • Either adopt posthog-js in the renderer specifically for early access APIs, OR
  • Implement early-access enrollment via PostHog APIs (will require auth + careful security), OR
  • Keep Mux’s current local toggle approach for “labs” features and reserve PostHog for experiments.

Given your immediate goal is AB testing Post-Compaction Context, I’d start with backend feature flags/experiments first.


Generated with mux • Model: anthropic:claude-opus-4-5 • Thinking: high

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

@ThomasK33 ThomasK33 force-pushed the posthog-feature-flags branch 7 times, most recently from 31b073d to b9f4947 Compare December 16, 2025 18:03
@ThomasK33
Copy link
Member Author

@codex review

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@ThomasK33
Copy link
Member Author

@codex review

@ThomasK33 ThomasK33 force-pushed the posthog-feature-flags branch from 1dd8596 to eacd94f Compare December 16, 2025 20:31
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ThomasK33
Copy link
Member Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Add backend-first PostHog feature flag evaluation for remote-controlled
experiments, starting with Post-Compaction Context.

Backend (ExperimentsService):
- Evaluate PostHog feature flags via posthog-node
- Disk cache (~/.mux/feature_flags.json) with TTL-based refresh
- Fail-closed behavior (unknown = disabled)
- Disable calls when telemetry is off

Telemetry enrichment (TelemetryService):
- setFeatureFlagVariant() adds $feature/<flagKey> to all events
- Enables variant breakdown in PostHog analytics

oRPC layer:
- experiments.getAll: Get all experiment values
- experiments.reload: Force refresh from PostHog

Frontend (ExperimentsContext):
- Fetch remote experiments on mount
- Priority: remote PostHog > local toggle > default
- Read-only UI when experiment is remote-controlled

Backend authoritative gating (WorkspaceService):
- sendMessage() resolves experiment from PostHog when enabled
- list() decides includePostCompaction based on experiment

Type consolidation:
- ExperimentValueSchema (Zod) is single source of truth
- ExperimentValue type derived via z.infer in types.ts

Bug fixes (unrelated):
- Fixed backgroundProcessManager exit race condition
- Fixed telemetry client Node.js compatibility
- Relaxed timing test threshold in authMiddleware

Change-Id: I346c924324a5f59cb3349614382dc8a5276e5e1e
Signed-off-by: Thomas Kosiewski <[email protected]>
Allow per-experiment control over whether users can override remote
PostHog assignments and whether experiments appear in Settings.

ExperimentDefinition changes:
- userOverridable?: boolean - when true, local toggle takes precedence
- showInSettings?: boolean - when false, hide from Settings UI

Resolution priority (when userOverridable=true):
1. Local localStorage toggle (user's explicit choice)
2. Remote PostHog assignment
3. Default (enabledByDefault)

Implementation:
- experiments.ts: Add new optional fields, set POST_COMPACTION_CONTEXT
  as userOverridable=true
- ExperimentsContext: hasLocalOverride() helper, updated useExperimentValue()
  and useAllExperiments() to respect userOverridable
- ExperimentsSection: Filter by showInSettings, enable toggle when canOverride
- WorkspaceService: Respect userOverridable in both list() and sendMessage()

Change-Id: I3afc8514c74151b8b72991aa13ab98296cfd19bb
Signed-off-by: Thomas Kosiewski <[email protected]>
- Makefile: MUX_ENABLE_TELEMETRY_IN_DEV=1 now sufficient (no need to also set MUX_DISABLE_TELEMETRY=0)
- ExperimentsSection: hide non-overridable experiments, remove PostHog info line
- Add experiment_overridden telemetry event with experimentId, assignedVariant, userChoice
- Update oRPC schema, payload types, tracking functions, and useTelemetry hook

Signed-off-by: Thomas Kosiewski <[email protected]>

---
_Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking: `high`_

Change-Id: I3582117b82c1025bcfd94d1361bb11c46cb8ff9e
Previously, getSendOptionsFromStorage() (used by resume/creation flows) always
passed the localStorage default to the backend, which treated any non-undefined
value as an explicit user override for userOverridable experiments.

Fix: isExperimentEnabled() now returns undefined for userOverridable experiments
when user hasn't explicitly set a localStorage value. The backend already handles
undefined correctly by falling through to PostHog assignment.

Also addresses code review feedback:
- Move telemetryService property declaration to top of ExperimentsService class
- Add comment explaining the "undefined" string check in hasLocalOverride
- Add docstring for getRemoteExperimentEnabled and move near other helpers
- Document MUX_ENABLE_TELEMETRY_IN_DEV env var in Makefile

Change-Id: I15e5360f7461cd62ad347fbb2e32b0e8dc4b873d
Signed-off-by: Thomas Kosiewski <[email protected]>
- useSendMessageOptions now passes undefined unless user explicitly overrides
- add useExperimentOverrideValue() helper
- harden localStorage parsing + add tests

Change-Id: I33d9f1c8d3bba4f083132c4f645c76328327726a
Signed-off-by: Thomas Kosiewski <[email protected]>
ExperimentsService can return { source: 'cache', value: null } on first launch while
PostHog refresh runs in the background.

The renderer previously fetched experiments.getAll only once, so remote variants
never became visible until manual reload.

Fix:
- Poll experiments.getAll with bounded backoff while any values are pending
- Add a regression test for ExperimentsProvider

Change-Id: If9533ee2ad0430729600275aedcf9b1939ec612d
Signed-off-by: Thomas Kosiewski <[email protected]>
@ThomasK33 ThomasK33 force-pushed the posthog-feature-flags branch from 0c6b512 to e171a6e Compare December 17, 2025 08:34
@ThomasK33 ThomasK33 added this pull request to the merge queue Dec 17, 2025
Merged via the queue into main with commit cc478e5 Dec 17, 2025
20 checks passed
@ThomasK33 ThomasK33 deleted the posthog-feature-flags branch December 17, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant