feat: token-based client anonymization layer (MRPC 1.6 / SBM AI compliance) by Copilot · Pull Request #475 · MightyPrytanis/codebase

Copilot · 2026-03-08T22:56:58Z

Client PII, PHI, and legal strategy currently reach cloud AI providers in the clear. This adds a local intercept layer that tokenizes sensitive entities before transmission and reverses the mapping after the response—so providers never see underlying identities.

`ClientAnonymizationService` (`Cyrano/src/services/client-anonymization.ts`)

anonymize(text, sessionId?) — extracts entities via regex (person, org, location, date, money, email, phone, SSN, account numbers, statutes, case citations) and replaces them with deterministic session-scoped tokens (PERSON_1, COMPANY_2, DATE_3, etc.)
deanonymize(text, sessionId) — reverses substitution locally using the in-memory token map; the mapping never leaves the process
assessRiskCategory(text) — classifies content per MRPC 1.6 / SBM AI guidance:
- Cat 1 – generic/safe
- Cat 2 – strategy-rich (settlement, litigation, privilege, negotiation keywords)
- Cat 3 – hard PII present (SSN, account number, DOB phrases)
Session management — in-memory, 4-hour TTL with automatic eviction; destroySession() for explicit cleanup; multi-turn support via shared session ID

`AIService.call()` integration (`Cyrano/src/services/ai-service.ts`)

New fields on AICallOptions:

// opt-in per call
await aiService.call('anthropic', prompt, {
  anonymize: true,
  anonymizationSessionId: existingSessionId, // optional, for multi-turn
});

Category 3 prompts throw before any network I/O — identifiable PII is hard-blocked from all cloud providers
Anonymization/de-anonymization is transparent to callers; the returned string always contains original values

Tests (`Cyrano/tests/services/client-anonymization.test.ts`)

26 unit tests covering entity replacement, within-session determinism, round-trip fidelity, risk category classification, session lifecycle, and TTL eviction.

Original prompt

This section details on the original issue you should resolve

<issue_title>Client Confidentiality, Strategy, and Anonymization</issue_title>
<issue_description>Are we taking sufficient steps to safeguard client information, PII, PHI, and legal strategy? Consider the following questions and recommendations:

What Counts as Defensible Anonymization for the State Bar of Michigan

From a cautious, SBM‑compliant angle, you want something closer to HIPAA‑style de‑identification than informal anonymization:

HIPAA’s expert‑determination standard requires that an expert conclude there is a “very small risk” that the information could be used, alone or with reasonably available data, to identify an individual.

Applied to legal facts, that means systematically stripping or tokenizing: names, identifiers, unique dates, specific dollar figures, highly specific locations, rare job titles, and any combination that makes the fact pattern obviously about a particular client/matter.

Best practice that’s emerging: a technical anonymization layer that intercepts text before it hits the AI, replaces all sensitive entities with deterministic tokens (e.g., CLIENT_A, COMPANY_X, DATE_1), and then reverses the mapping locally after the AI responds. In that workflow, the provider never sees the underlying identities at all.

Even with strong anonymization, what you still must check:

Even if you’re comfortable that the text is non‑identifiable:

Are you still exposing confidential legal strategy or work product?

Privilege and MRPC 1.6 cover more than just “who the client is.” Sharing unique litigation strategy, settlement posture, or internal risk assessments with a third‑party AI can still be a confidentiality issue, even if the person is anonymized.

What do the provider’s terms say?

Michigan’s AI FAQ stresses that many AI tools “utilize the information entered to learn,” so inputs may be stored and regurgitated. If the provider keeps and reuses your “anonymized” data, that can still prejudice clients or expose work product, even if they can’t easily attach a name.

Could a subpoena or breach hurt your client anyway?

Ethics and practice‑management guidance points out that if opposing counsel could subpoena the AI provider for “all prompts related to industry X mergers in 2025,” you might still be uncomfortable, even without explicit names, because your strategies or deal structures could be reconstructed.

So anonymization is necessary but not sufficient; you still need a risk assessment of what you’re actually revealing and under what contractual/security regime.

Practical, conservative rule set

If you want a clean, defensible line for your own practice:

Category 1 – Truly anonymized & generic:

• Matter is converted via a robust token‑based system or equivalent, operated locally.
• No combination of facts would reasonably allow identification of the client or matter by an outsider.
• You are not transmitting unique strategy or internal mental impressions, just generic drafting/structuring questions. → Using mainstream AI tools here is relatively low risk, though you should still prefer providers with no‑training and strong security terms.

Category 2 – Anonymized but fact‑specific or strategy‑rich:

• You’ve stripped names, but the fact pattern is unusual, high‑profile, or includes distinctive dollar amounts/timing; or you’re discussing concrete litigation/negotiation strategy. → Treat as still confidential; only use if (a) provider is on an enterprise, no‑training, contractually locked‑down tier, and (b) you’d be comfortable explaining the use to the client and to a judge.

Category 3 – Identifiable or sensitive by context:

• The story is obviously about a particular person or organization (local public figure, single major employer in a small town, etc.), or involves particularly sensitive categories (health, crime, immigration, harassment with rare facts). → Do not send to general‑purpose cloud AI at all; use local/self‑hosted models or specialized legal AI tools with SOC2‑grade assurances and explicit no‑retention/no‑training language.

How are we currently “anonymizing” and "de-anonymizing" client information for legal research and writing, client communications, etc.(manual edits, search/replace, a script, or something like a proxy layer)? Whatever it is, it should be able to pass a HIPAA‑style “very small risk” test and align with the recent privilege cases even though users are unlikely to be covered entities under HIPAA.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Client Confidentiality, Strategy, and Anonymization #474

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: MightyPrytanis <219587333+MightyPrytanis@users.noreply.github.com>

Initial plan

5775686

Copilot AI assigned Copilot and MightyPrytanis Mar 8, 2026

Copilot started work on behalf of MightyPrytanis March 8, 2026 22:57 View session

Copilot AI and others added 2 commits March 8, 2026 23:03

feat: add client anonymization service and integrate into AIService

a5f222e

Co-authored-by: MightyPrytanis <219587333+MightyPrytanis@users.noreply.github.com>

refactor: address code review feedback on client-anonymization service

ec6ef6a

Co-authored-by: MightyPrytanis <219587333+MightyPrytanis@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update client information anonymization procedures~~ feat: token-based client anonymization layer (MRPC 1.6 / SBM AI compliance) Mar 8, 2026

Copilot finished work on behalf of MightyPrytanis March 8, 2026 23:12

Merge branch 'main' into copilot/implement-client-anonymization-strategy

43c9b60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: token-based client anonymization layer (MRPC 1.6 / SBM AI compliance)#475

feat: token-based client anonymization layer (MRPC 1.6 / SBM AI compliance)#475
Copilot wants to merge 4 commits intomainfrom
copilot/implement-client-anonymization-strategy

Copilot AI commented Mar 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ClientAnonymizationService (Cyrano/src/services/client-anonymization.ts)

AIService.call() integration (Cyrano/src/services/ai-service.ts)

Tests (Cyrano/tests/services/client-anonymization.test.ts)

What Counts as Defensible Anonymization for the State Bar of Michigan

Even with strong anonymization, what you still must check:

Are you still exposing confidential legal strategy or work product?

What do the provider’s terms say?

Could a subpoena or breach hurt your client anyway?

Practical, conservative rule set

Category 1 – Truly anonymized & generic:

Category 2 – Anonymized but fact‑specific or strategy‑rich:

Category 3 – Identifiable or sensitive by context:

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 8, 2026 •

edited

Loading

`ClientAnonymizationService` (`Cyrano/src/services/client-anonymization.ts`)

`AIService.call()` integration (`Cyrano/src/services/ai-service.ts`)

Tests (`Cyrano/tests/services/client-anonymization.test.ts`)