Skip to content

feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29)#1

Merged
stoll merged 1 commit intostoll:mainfrom
garrytan:main
Mar 13, 2026
Merged

feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29)#1
stoll merged 1 commit intostoll:mainfrom
garrytan:main

Conversation

@stoll
Copy link
Copy Markdown
Owner

@stoll stoll commented Mar 13, 2026

  • Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots
  • CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift)
  • Async buffer flush with Bun.write() (was appendFileSync)
  • Dialog auto-accept/dismiss with buffer + prompt text support
  • File upload command (upload <file...>)
  • Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
  • Annotated screenshots with ref labels overlaid (-a flag)
  • Snapshot diffing against previous snapshot (-D flag)
  • Cursor-interactive element scan for non-ARIA clickables (-C flag)
  • Snapshot scoping depth limit (-d N flag)
  • Health check with page.evaluate + 2s timeout
  • Playwright error wrapping — actionable messages for AI agents
  • Fix useragent — context recreation preserves cookies/storage/URLs
  • wait --networkidle / --load / --domcontentloaded flags
  • console --errors filter (error + warning only)
  • cookie-import with auto-fill domain from page URL
  • 166 integration tests (was ~63)
  • Phase 2: Rewrite SKILL.md as QA playbook + command reference

Reorient SKILL.md files from raw command reference to QA-first playbook with 10 workflow patterns (test user flows, verify deployments, dogfood features, responsive layouts, file upload, forms, dialogs, compare pages). Compact command reference tables at the bottom.

  • Phase 3: /qa skill — systematic QA testing with health scores

New /qa skill for systematic web app QA testing. Three modes:

  • full: 5-10 documented issues with screenshots and repro steps
  • quick: 30-second smoke test with health score
  • regression: compare against saved baseline

Includes issue taxonomy (7 categories, 4 severity levels), structured report template, health score rubric (weighted across 7 categories), framework detection guidance (Next.js, Rails, WordPress, SPA).

Also adds browse/bin/find-browse (DRY binary discovery using git rev-parse), .gstack/ to .gitignore, and updated TODO roadmap.

  • Bump to v0.3.0 — Phase 2 + Phase 3 changelog

  • feat: cookie-import-browser — Chromium cookie decryption module + tests

Pure logic module for reading and decrypting cookies from macOS Chromium browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC encryption with macOS Keychain access, PBKDF2 key derivation, and per-browser key caching. 18 unit tests with encrypted cookie fixtures.

  • feat: cookie picker web UI + route handler

Two-panel dark-theme picker served from the browse server. Left panel shows source browser domains with search and import buttons. Right panel shows imported domains with trash buttons. No cookie values exposed. 6 API endpoints, importedDomains Set tracking, inline clearCookies.

  • feat: wire cookie-import-browser into browse server

Add cookie-picker route dispatch (no auth, localhost-only), add cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort property to BrowserManager, add write command with two modes (picker UI vs --domain direct import), update CLI help text.

type no longer echoes text (reports character count), cookie redacts value with ****, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token, storage set drops value, forms redacts password fields. Prevents secrets from persisting in LLM transcripts. 7 new tests.

Credit: fredluz (PR garrytan#21)

Add validateOutputPath() for screenshot/pdf/responsive (restricts to /tmp and cwd) and validateReadPath() for eval (blocks .. sequences and absolute paths outside safe dirs). 7 new tests.

Credit: Jah-yee (PR garrytan#26)

Setup now verifies Playwright can launch Chromium, and auto-installs it via bunx playwright install chromium if missing. Exits non-zero if build or Chromium launch fails.

Credit: AkbarDevop (PR garrytan#22)

  • security: fix path validation bypass, CORS restriction, cookie-import path check
  • startsWith('/tmp') matched '/tmpevil' — now requires trailing slash
  • CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:
  • cookie-import now validates file paths (was missing validateReadPath)
  • 3 new tests for prefix collision and cookie-import path traversal
  • fix: address review informational issues + add regression tests
  • Add cookie-import to CHAIN_WRITE set for chain command routing
  • Add path validation to snapshot -a -o output path
  • Fix package.json version to match 0.3.1
  • Use crypto.randomUUID() for temp DB paths (unpredictable filenames)
  • Add regression tests for chain cookie-import and snapshot path validation
  • docs: add /qa, /setup-browser-cookies to README + update BROWSER.md
  • Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs
  • Add dedicated README sections for both new skills with usage examples
  • Update demo workflow to show cookie import → QA → browse flow
  • Update BROWSER.md: cookie import commands, new source files, test count (203)
  • Update skill count from 6 to 8
  • feat: team-aware /retro v2.0 — per-person praise and growth opportunities
  • Identify current user via git config, orient narrative as "you" vs teammates
  • Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions
  • New "Your Week" section with personal deep-dive for whoever runs the command
  • New "Team Breakdown" with per-person praise and growth opportunities
  • Track AI-assisted commits via Co-Authored-By trailers
  • Personal + team shipping streaks
  • Tone: praise like a 1:1, growth like investment advice, never compare negatively
  • docs: add Conductor parallel sessions section to README

* Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots

- CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift)
- Async buffer flush with Bun.write() (was appendFileSync)
- Dialog auto-accept/dismiss with buffer + prompt text support
- File upload command (upload <sel> <file...>)
- Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
- Annotated screenshots with ref labels overlaid (-a flag)
- Snapshot diffing against previous snapshot (-D flag)
- Cursor-interactive element scan for non-ARIA clickables (-C flag)
- Snapshot scoping depth limit (-d N flag)
- Health check with page.evaluate + 2s timeout
- Playwright error wrapping — actionable messages for AI agents
- Fix useragent — context recreation preserves cookies/storage/URLs
- wait --networkidle / --load / --domcontentloaded flags
- console --errors filter (error + warning only)
- cookie-import <json-file> with auto-fill domain from page URL
- 166 integration tests (was ~63)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Phase 2: Rewrite SKILL.md as QA playbook + command reference

Reorient SKILL.md files from raw command reference to QA-first playbook
with 10 workflow patterns (test user flows, verify deployments, dogfood
features, responsive layouts, file upload, forms, dialogs, compare pages).
Compact command reference tables at the bottom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Phase 3: /qa skill — systematic QA testing with health scores

New /qa skill for systematic web app QA testing. Three modes:
- full: 5-10 documented issues with screenshots and repro steps
- quick: 30-second smoke test with health score
- regression: compare against saved baseline

Includes issue taxonomy (7 categories, 4 severity levels), structured
report template, health score rubric (weighted across 7 categories),
framework detection guidance (Next.js, Rails, WordPress, SPA).

Also adds browse/bin/find-browse (DRY binary discovery using git
rev-parse), .gstack/ to .gitignore, and updated TODO roadmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Bump to v0.3.0 — Phase 2 + Phase 3 changelog

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: cookie-import-browser — Chromium cookie decryption module + tests

Pure logic module for reading and decrypting cookies from macOS Chromium
browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC
encryption with macOS Keychain access, PBKDF2 key derivation, and
per-browser key caching. 18 unit tests with encrypted cookie fixtures.

* feat: cookie picker web UI + route handler

Two-panel dark-theme picker served from the browse server. Left panel
shows source browser domains with search and import buttons. Right panel
shows imported domains with trash buttons. No cookie values exposed.
6 API endpoints, importedDomains Set tracking, inline clearCookies.

* feat: wire cookie-import-browser into browse server

Add cookie-picker route dispatch (no auth, localhost-only), add
cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort
property to BrowserManager, add write command with two modes (picker UI
vs --domain direct import), update CLI help text.

* chore: /setup-browser-cookies skill + docs (Phase 3.5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: redact sensitive values from command output (PR #21)

type no longer echoes text (reports character count), cookie redacts
value with ****, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token,
storage set drops value, forms redacts password fields. Prevents secrets
from persisting in LLM transcripts. 7 new tests.

Credit: fredluz (PR #21)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: path traversal prevention for screenshot/pdf/eval (PR #26)

Add validateOutputPath() for screenshot/pdf/responsive (restricts to
/tmp and cwd) and validateReadPath() for eval (blocks .. sequences and
absolute paths outside safe dirs). 7 new tests.

Credit: Jah-yee (PR #26)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: auto-install Playwright Chromium in setup (PR #22)

Setup now verifies Playwright can launch Chromium, and auto-installs
it via `bunx playwright install chromium` if missing. Exits non-zero
if build or Chromium launch fails.

Credit: AkbarDevop (PR #22)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: fix path validation bypass, CORS restriction, cookie-import path check

- startsWith('/tmp') matched '/tmpevil' — now requires trailing slash
- CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:<port>
- cookie-import now validates file paths (was missing validateReadPath)
- 3 new tests for prefix collision and cookie-import path traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review informational issues + add regression tests

- Add cookie-import to CHAIN_WRITE set for chain command routing
- Add path validation to snapshot -a -o output path
- Fix package.json version to match 0.3.1
- Use crypto.randomUUID() for temp DB paths (unpredictable filenames)
- Add regression tests for chain cookie-import and snapshot path validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add /qa, /setup-browser-cookies to README + update BROWSER.md

- Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs
- Add dedicated README sections for both new skills with usage examples
- Update demo workflow to show cookie import → QA → browse flow
- Update BROWSER.md: cookie import commands, new source files, test count (203)
- Update skill count from 6 to 8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: team-aware /retro v2.0 — per-person praise and growth opportunities

- Identify current user via git config, orient narrative as "you" vs teammates
- Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions
- New "Your Week" section with personal deep-dive for whoever runs the command
- New "Team Breakdown" with per-person praise and growth opportunities
- Track AI-assisted commits via Co-Authored-By trailers
- Personal + team shipping streaks
- Tone: praise like a 1:1, growth like investment advice, never compare negatively

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add Conductor parallel sessions section to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@stoll stoll merged commit e971973 into stoll:main Mar 13, 2026
stoll pushed a commit that referenced this pull request Mar 28, 2026
…ytan#384)

* feat: /cso v2 — infrastructure-first security audit

Rewrite /cso from code-centric OWASP scanning to infrastructure-first
attack surface analysis. 15 phases covering secrets archaeology, dependency
supply chain, CI/CD pipeline security, webhook verification, LLM/AI
security, skill supply chain scanning, plus OWASP Top 10, STRIDE, and
data classification.

Key design decisions from eng review + Codex adversarial review:
- Soft gate stack detection (prioritize, don't skip)
- Error on conflicting scope flags (never silently ignore)
- Permission gate before scanning ~/.claude/skills/
- Graceful degradation when audit tools aren't installed
- Finding fingerprints for cross-run trend tracking
- Variant analysis: one verified vuln triggers codebase-wide search
- Dual confidence modes: daily (8/10 gate) vs comprehensive (2/10)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: /cso v2 acknowledgements — 10 projects that informed the design

Credits: Sentry (confidence gating), Trail of Bits (mental model + variant
analysis), Shannon/Keygraph (active verification validation), afiqiqmal
(framework detection + LLM security), Snyk ToxicSkills (skill supply chain),
Miessler PAI (incident playbooks), McGo (report format), Claude Code
Security Pack (modular validation), Anthropic CCS (500+ zero-days), and
@gus_argon (v1 blind spot identification).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: /cso v2 E2E tests — full audit, diff mode, infra scope

Three E2E test cases with planted vulnerabilities:
- cso-full-audit: hardcoded API key + .env tracked by git
- cso-diff-mode: webhook without signature verification on feature branch
- cso-infra-scope: unpinned GitHub Action + Dockerfile without USER

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /cso E2E tests — correct logCost and recordE2E signatures

logCost requires (label, result), recordE2E requires (collector, name,
suite, result). Fixed all 3 test cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /cso infra E2E test — increase timeout to 360s

The infra scope test runs Agent sub-tasks for parallel finding
verification which can take longer than 240s. Increased maxTurns
from 25 to 60 and timeout from 240s to 360s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /cso infra E2E test — sharper prompt to prevent exploration waste

The agent was burning 30+ turns exploring a 3-file repo (18 Glob calls,
Explore subagent, 4 SKILL.md reads) before starting the audit. Two Agent
verification subagents then ate ~100s, causing the 240s timeout.

Fix: tell the agent the repo is tiny, list the exact files, skip the
preamble, remove Agent from allowed tools, reduce maxTurns 60→30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.11.6.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Codex adversarial findings in /cso v2

Six fixes from Codex adversarial review:

1. Phase 2: Use `git log -G` (regex) instead of `-S` (literal) for
   patterns with alternation (ghp_|gho_|github_pat_, etc.)

2. Phase 12 exclusion garrytan#5: Add exception so CI/CD pipeline findings
   from Phase 4 are never auto-discarded when --infra is active

3. Phase 12 exclusion garrytan#6: Add exception that unpinned actions and
   missing CODEOWNERS are concrete risks, not "missing hardening"

4. Phase 12 exclusion garrytan#15: Add exception that SKILL.md files are
   executable prompt code, not documentation — Phase 8 findings
   in SKILL.md must not be excluded

5. Phase 12 exclusion #1: Add exception that LLM cost/spend
   amplification from Phase 7 is financial risk, not DoS

6. E2E tests: Add exitReason === 'success' assertion to all 3 tests;
   move finalizeEvalCollector to file-level afterAll

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants