Refactor diarization configuration and unify transcription provider interfaces with networking fixes by AnkushMalaker · Pull Request #89 · SimpleOpenSoftware/chronicle

AnkushMalaker · 2025-09-09T10:54:29Z

Summary by CodeRabbit

New Features
- Centralized, persistently stored diarization settings with validation and environment-aware loading/saving.
- Optional diarization support threaded through transcription and streaming; enabled for Deepgram where available.
- Unified speaker recognition flow to a single diarization/identification endpoint for consistent behavior.
- Added WebSocket /v1/ proxy for speaker service, enabling Deepgram-compatible connections.
Refactor
- Consolidated diarization configuration management into a dedicated module for consistency and reuse.
Chores
- Improved service startup behavior based on environment (CPU/GPU, HTTP/HTTPS).
- Added dotenv dependency and network alias for speaker service.

coderabbitai · 2025-09-09T10:54:36Z

Walkthrough

Centralizes diarization settings into a new config module, refactors system controller to use it, updates transcription providers to support a diarize flag, adjusts transcription and speaker recognition flows to use configuration, adds WebSocket proxying and network aliasing for speaker service, introduces specialized service startup logic, and adds dotenv dependency.

Changes

Cohort / File(s)	Summary
Config module (new) `backends/advanced/src/advanced_omi_backend/config.py`	New centralized diarization settings: defaults, path resolution, load/save JSON with template fallback, module-level cache, logging.
System controller refactor `backends/advanced/src/advanced_omi_backend/controllers/system_controller.py`	Removes inline config; imports loader/saver from config.py; get/save endpoints now file-backed with stricter validation and updated response shapes.
Transcription providers interface `backends/advanced/src/advanced_omi_backend/transcription_providers.py`	Adds diarize support across providers: new/updated method signatures (batch/streaming), kwargs channel, Deepgram wires diarize, Parakeet warns/ignores.
Transcription integration `backends/advanced/src/advanced_omi_backend/transcription.py`	Reads diarization_source from config; conditionally sets diarize flag for Deepgram providers; passes diarize to provider transcribe.
Speaker recognition client `backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py`	Switches to config import; unifies endpoint to /v1/diarize-identify-match; sends transcript_data with parameters; removes provider-specific config blocks; updates logs.
Speaker-recognition deployment `extras/speaker-recognition/nginx.conf.template`, `extras/speaker-recognition/docker-compose.yml`	Nginx: adds /v1/ WebSocket proxy and extra headers; docker-compose: adds default network alias speaker-service for GPU service.
Orchestration/CLI `services.py`, `setup-requirements.txt`	services.py: specialized startup for speaker-recognition based on .env (COMPUTE_MODE, REACT_UI_HTTPS); preserves other service behavior; keeps --build handling. setup-requirements: adds dotenv dependency.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client as API Caller
  participant System as SystemController
  participant Config as config.py
  participant FS as Filesystem

  rect rgba(230,240,255,0.5)
  Client->>System: GET /diarization-settings
  System->>Config: load_diarization_settings_from_file()
  Config->>FS: read JSON or copy from template
  FS-->>Config: settings or fallback
  Config-->>System: settings
  System-->>Client: {status: success, settings}
  end

  rect rgba(240,230,255,0.5)
  Client->>System: POST /diarization-settings (partial updates)
  System->>System: validate & merge
  System->>Config: save_diarization_settings_to_file(merged)
  Config->>FS: write JSON
  FS-->>Config: ok/fail
  Config-->>System: True/False
  System-->>Client: {status: success/partial, settings}
  end

sequenceDiagram
  autonumber
  participant App as Transcription (caller)
  participant Cfg as config.py
  participant Prov as TranscriptionProvider
  note over App,Cfg: Determine diarize flag
  App->>Cfg: load_diarization_settings_from_file()
  Cfg-->>App: {diarization_source}
  alt provider=Deepgram & source=deepgram
    App->>Prov: transcribe(audio, sr, diarize=true)
    Prov-->>App: transcript (with diarization if supported)
  else other cases
    App->>Prov: transcribe(audio, sr, diarize=false)
    Prov-->>App: transcript
  end

sequenceDiagram
  autonumber
  participant SRC as SpeakerRecognitionClient
  participant Cfg as config.py
  participant Svc as Speaker Service (/v1/diarize-identify-match)

  SRC->>Cfg: load_diarization_settings_from_file()
  Cfg-->>SRC: settings
  alt diarization_source = deepgram
    note right of SRC: Fallback identification over transcript segments
    SRC->>Svc: POST transcript_data, user_id, similarity_threshold, min_duration
  else diarization_source = pyannote
    SRC->>Svc: POST transcript_data, similarity_threshold, min_duration, collar, min_duration_off, [min/max_speakers]
  end
  Svc-->>SRC: diarize-identify-match result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Better init system #74 — Also modifies diarization config provisioning (templates/init), overlapping with this PR’s centralized config handling.
Openmemory mcp #69 — Touches advanced backend configuration pathways while adding memory subsystem features; potential intersections with service/env orchestration.

Poem

In cables and configs I happily hop,
A whiskered dev with a carrot-top.
Diarize flags thread through the night,
Providers listen—left and right.
WebSockets hum, NGINX sings,
New paths align on gentle springs.
Ship it—thump-thump—my lucky rings! 🥕🐇

Pre-merge checks (1 warning, 2 inconclusive)

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	❓ Inconclusive	The current description only mentions networking fixes for the speaker service and omits the major refactoring efforts around configuration management, controller logic, transcription modules, and service startup enhancements. As a result, it remains too generic to convey the full context of the pull request.	Expand the description to include a concise summary of all significant changes—such as the new diarization config module, controller and client updates, transcription provider interface enhancements, and networking adjustments—to give reviewers a complete overview of the PR.
Title Check	❓ Inconclusive	The title attempts to cover multiple substantial changes—including refactoring the diarization configuration, unifying transcription provider interfaces, and networking fixes—resulting in a long, multi-topic sentence that dilutes focus and clarity while not prioritizing a single primary change.	Consider shortening the title to highlight the main intent of the PR—such as “Centralize diarization configuration and unify transcription interfaces”—and handle networking adjustments in a separate follow-up or as a subtitle to improve readability and maintain single-purpose focus.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch speaker-service-networking-fix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AnkushMalaker · 2025-09-09T10:54:37Z

@coderabbitai review

coderabbitai · 2025-09-09T10:54:44Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

extras/speaker-recognition/docker-compose.yml (2)
3-31: Prevent unintended startup of the base service when using profiles.

Service speaker-service (the anchor) has no profile; docker compose --profile {cpu|gpu} up -d will also start this base service, causing port conflicts and wrong routing.

Low-impact fix without restructuring to x- anchors:
 services:
   # Base Speaker Recognition Service Configuration
   speaker-service:
     &base-speaker-service
+    profiles: ["base"]  # not selected by cpu/gpu flows
Alternatively convert the base block to an x-speaker-service extension and remove the active speaker-service service.

27-31: Healthcheck should target localhost, not a network alias.

Inside the container, curl against speaker-service may resolve to another container (or fail if alias missing). Use loopback.
-      test: ["CMD", "curl", "-f", "http://speaker-service:8085/health"]
+      test: ["CMD", "curl", "-f", "http://localhost:8085/health"]
services.py (1)
163-170: Don't time out long-running compose operations.

up -d without --build can still build/pull and exceed 120s. Remove timeout for up, keep it for lightweight commands.
-            result = subprocess.run(
+            # No timeout for 'up' to allow pulls/builds; keep timeout for others
+            timeout_val = None if 'up' in cmd else 120
+            result = subprocess.run(
                 cmd,
                 cwd=service_path,
                 capture_output=True,
                 text=True,
                 check=False,
-                timeout=120  # 2 minute timeout for service status checks
+                timeout=timeout_val
             )
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (2)
175-186: Remove dependency on controllers.system_controller; use the new config module.

Refactor leftovers will break if _diarization_settings moved.
-                    from advanced_omi_backend.controllers.system_controller import _diarization_settings
+                    from advanced_omi_backend.config import load_diarization_settings_from_file as _load_cfg
+                    _diarization_settings = _load_cfg()
249-260: Same refactor for identify_speakers().
-                    from advanced_omi_backend.controllers.system_controller import _diarization_settings
+                    from advanced_omi_backend.config import load_diarization_settings_from_file as _load_cfg
+                    _diarization_settings = _load_cfg()
backends/advanced/src/advanced_omi_backend/transcription.py (1)
284-286: Double-trigger risk: memory processing is queued here and also on conversation close.

Top-of-file comment says memory is handled “exclusively by conversation closure,” yet this call re-introduces it, likely causing duplicate processing.

Remove the call:
-            # Queue memory processing now that transcription is complete
-            await self._queue_memory_processing()
+            # Memory processing is triggered by conversation closure to avoid duplicates

🧹 Nitpick comments (17)

services.py (3)

81-84: Accept common truthy values for REACT_UI_HTTPS.

Handle "1/yes/true" robustly.
-            https_enabled = env_values.get('REACT_UI_HTTPS', 'false')
+            https_enabled = env_values.get('REACT_UI_HTTPS', 'false')
+            https_enabled = str(https_enabled).strip().lower() in ('1', 'true', 'yes', 'on')
And branch on if https_enabled:.

111-112: Place --build before services to avoid CLI parsing edge cases.

Safer ordering with compose.
-    if command == 'up' and build:
-        cmd.append('--build')
+    if command == 'up' and build:
+        cmd.insert(len(cmd), '--build')  # keep before any service names if present
11-11: Remove unused import.

os isn’t used.
-import os

extras/speaker-recognition/nginx.conf.template (1)

97-113: Good WebSocket proxy for /v1; consider passing subprotocol when required.

If clients use Sec-WebSocket-Protocol (e.g., provider-specific subprotocols), forward it.
         location /v1/ {
             proxy_pass http://speaker_service/v1/;
             proxy_http_version 1.1;
             proxy_set_header Upgrade $http_upgrade;
             proxy_set_header Connection $connection_upgrade;
+            proxy_set_header Sec-WebSocket-Protocol $http_sec_websocket_protocol;

backends/advanced/src/advanced_omi_backend/config.py (2)

60-79: Narrow exceptions and log stack traces.

Use specific exceptions where possible and logger.exception for diagnostics.

-        if template_path.exists():
-            try:
+        if template_path.exists():
+            try:
                 # Ensure parent directory exists
                 config_path.parent.mkdir(parents=True, exist_ok=True)
                 shutil.copy(template_path, config_path)
                 logger.info(f"Created diarization config from template at {config_path}")
-            except Exception as e:
-                logger.warning(f"Could not copy template to {config_path}: {e}")
+            except (OSError, IOError):
+                logger.exception(f"Could not copy template to {config_path}")

-    if config_path.exists():
-        try:
+    if config_path.exists():
+        try:
             with open(config_path, 'r') as f:
                 _diarization_settings = json.load(f)
                 logger.info(f"Loaded diarization settings from {config_path}")
                 return _diarization_settings
-        except Exception as e:
-            logger.error(f"Error loading diarization settings from {config_path}: {e}")
+        except (OSError, IOError, json.JSONDecodeError):
+            logger.exception(f"Error loading diarization settings from {config_path}")

92-107: Same for save path; bubble precise errors.

Also keep stack traces.

-    try:
+    try:
         # Ensure parent directory exists
         config_path.parent.mkdir(parents=True, exist_ok=True)
         
         # Write settings to file
         with open(config_path, 'w') as f:
             json.dump(settings, f, indent=2)
         
         # Update cache
         _diarization_settings = settings
         
         logger.info(f"Saved diarization settings to {config_path}")
         return True
-    except Exception as e:
-        logger.error(f"Error saving diarization settings to {config_path}: {e}")
+    except (OSError, IOError, TypeError, ValueError):
+        logger.exception(f"Error saving diarization settings to {config_path}")
         return False

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)

136-141: Use logger.exception for full stack traces.

Improves debuggability.

-        except Exception as e:
-            logger.error(f"Error during diarize-identify-match: {e}")
+        except Exception:
+            logger.exception("Error during diarize-identify-match")
             return {}

backends/advanced/src/advanced_omi_backend/transcription.py (2)

16-16: Avoid per-call config file reads; cache diarization source (hot path).

load_diarization_settings_from_file() is invoked on every transcription, adding file I/O to a hot path. Cache with a short TTL or inject via constructor.

Apply this inline change to use a cached helper:

-            config = load_diarization_settings_from_file()
-            diarization_source = config.get("diarization_source", "pyannote")
+            diarization_source = self._get_diarization_source()

Add this helper to the class (outside the shown range):

def _get_diarization_source(self) -> str:
    try:
        ttl_s = 5.0
        now = time.time()
        if not hasattr(self, "_diar_src_cache") or (now - getattr(self, "_diar_src_cache_ts", 0)) > ttl_s:
            cfg = load_diarization_settings_from_file()
            self._diar_src_cache = cfg.get("diarization_source", "pyannote")
            self._diar_src_cache_ts = now
        return self._diar_src_cache
    except Exception:
        return "pyannote"

121-133: Diarization toggle: add explicit “disabled” log for mismatches; guard by capability.

When config requests Deepgram diarization but the active provider isn’t Deepgram, log why diarization is off to aid ops/debugging. Optionally move capability knowledge to providers.

Minimal change:

             if should_diarize:
                 logger.info(f"Requesting diarization from {self.provider.name} (diarization_source=deepgram)")
+            else:
+                logger.debug(
+                    "Diarization not requested (source=%s, provider=%s)",
+                    diarization_source, self.provider.name
+                )

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)

792-796: Normalize diarization_source to lowercase; accept canonical values only.

Prevents surprising 400s for "Deepgram"/"PYANNOTE".
-            elif key == "diarization_source":
-                if not isinstance(value, str) or value not in ["pyannote", "deepgram"]:
+            elif key == "diarization_source":
+                if not isinstance(value, str) or value.lower() not in {"pyannote", "deepgram"}:
                     return JSONResponse(
                         status_code=400, content={"error": f"Invalid value for {key}: must be 'pyannote' or 'deepgram'"}
                     )
+                # Normalize
+                settings[key] = value.lower()
808-814: Persist atomically; adjust success payload shape stays the same.

If save uses plain write, a crash can corrupt the file. Prefer atomic write (temp file + replace). If the helper already does this, ignore.

No code change requested here, just a heads-up.

backends/advanced/src/advanced_omi_backend/transcription_providers.py (6)

48-56: Base API extension is fine; avoid signature drift across hierarchies.

You now have Base.transcribe(**kwargs) and Batch.transcribe(diarize: bool). Prefer one canonical signature to avoid type checker confusion and future drift (see lines 127–136 diff below).

127-136: Unify abstract method signature with Base to prevent divergence.

Match Base’s **kwargs to keep subclass contracts consistent.

-    @abc.abstractmethod
-    async def transcribe(self, audio_data: bytes, sample_rate: int, diarize: bool = False) -> dict:
-        """Transcribe audio data.
-        
-        Args:
-            audio_data: Raw audio bytes
-            sample_rate: Audio sample rate
-            diarize: Whether to enable speaker diarization (provider-dependent)
-        """
-        pass
+    @abc.abstractmethod
+    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+        """Transcribe audio data (provider-specific kwargs like diarize may be supported)."""
+        pass

312-322: Streaming: consider WebSocket keepalive and frame size tuning.

To improve resilience on long sessions/big interim payloads:

Set ping_interval to keep the connection alive.
Increase max_size if large messages are expected.

Example:

-            websocket = await websockets.connect(
-                ws_url,
-                extra_headers={"Authorization": f"Token {self.api_key}"}
-            )
+            websocket = await websockets.connect(
+                ws_url,
+                extra_headers={"Authorization": f"Token {self.api_key}"},
+                ping_interval=20,
+                max_size=4 * 1024 * 1024,
+            )

Also applies to: 329-329

471-474: Silence ARG002 (unused args) in streaming transcribe stub.

Rename to underscores.

-    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+    async def transcribe(self, _audio_data: bytes, _sample_rate: int, **_kwargs) -> dict:

501-501: Silence ARG002 for kwargs in Parakeet batch transcribe.

-    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+    async def transcribe(self, audio_data: bytes, sample_rate: int, **_kwargs) -> dict:

748-752: Silence ARG002 (unused args) in Parakeet streaming transcribe stub.

-    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+    async def transcribe(self, _audio_data: bytes, _sample_rate: int, **_kwargs) -> dict:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c79db94 and eb05aeb.

📒 Files selected for processing (9)

backends/advanced/src/advanced_omi_backend/config.py (1 hunks)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (3 hunks)
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (2 hunks)
backends/advanced/src/advanced_omi_backend/transcription.py (2 hunks)
backends/advanced/src/advanced_omi_backend/transcription_providers.py (9 hunks)
extras/speaker-recognition/docker-compose.yml (1 hunks)
extras/speaker-recognition/nginx.conf.template (1 hunks)
services.py (2 hunks)
setup-requirements.txt (1 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

backends/advanced/src/advanced_omi_backend/config.py

67-67: Do not catch blind exception: Exception

(BLE001)

77-77: Do not catch blind exception: Exception

(BLE001)

78-78: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

104-104: Consider moving this statement to an else block

(TRY300)

105-105: Do not catch blind exception: Exception

(BLE001)

106-106: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/transcription_providers.py

471-471: Unused method argument: audio_data

(ARG002)

471-471: Unused method argument: sample_rate

(ARG002)

471-471: Unused method argument: kwargs

(ARG002)

501-501: Unused method argument: kwargs

(ARG002)

748-748: Unused method argument: audio_data

(ARG002)

748-748: Unused method argument: sample_rate

(ARG002)

748-748: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: speaker-recognition-tests
GitHub Check: integration-tests

🔇 Additional comments (8)

extras/speaker-recognition/docker-compose.yml (2)
101-102: Verify the Nginx config path: docker-compose mounts nginx.conf, but repo ships nginx.conf.template.

This likely 404s at container start unless another process materializes nginx.conf.

If the file isn’t generated, either rename the file or mount the template path:
-      - ./nginx.conf:/etc/nginx/nginx.conf:ro
+      - ./nginx.conf.template:/etc/nginx/nginx.conf:ro
87-91: Web UI healthcheck may be wrong when HTTPS is disabled.

It always probes https://localhost:${REACT_UI_PORT}. If REACT_UI_HTTPS=false, this will fail.

Consider switching protocol based on REACT_UI_HTTPS or probing the internal port the app actually listens on.
extras/speaker-recognition/nginx.conf.template (1)

48-50: Verify Web UI upstream port.

Upstream is pinned to 5175. Ensure the web-ui actually listens on that port for both HTTP/HTTPS modes; otherwise requests will 502.

If the port is driven by REACT_UI_PORT, consider templating the nginx config at container start (envsubst) to keep them in sync.

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)

66-69: Good: centralized config import avoids controller coupling.

This change reduces circular-import risk.

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)

22-22: Config module import change looks good.

backends/advanced/src/advanced_omi_backend/transcription_providers.py (3)

150-157: Deepgram diarization wiring LGTM.

The diarize flag is correctly threaded into query params and remains optional. Timeout scaling by duration is a nice touch.

Also applies to: 164-164

572-582: Clear warning on unsupported diarize is good UX.

No changes requested.

94-101: No start_stream calls found—verify external callers pass diarize
No in-repo call sites reference start_stream; ensure any consumers (e.g. orchestration or service layers) include the new diarize argument when invoking this method.

coderabbitai · 2025-09-09T11:04:42Z

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py

+        # Get current settings and merge with new values
+        current_settings = load_diarization_settings_from_file()
+        current_settings.update(settings)



🛠️ Refactor suggestion

Validate min_speakers ≤ max_speakers after merge.

Prevents impossible configs reaching runtime.

current_settings = load_diarization_settings_from_file() current_settings.update(settings) + # Cross-field validation + min_s = current_settings.get("min_speakers") + max_s = current_settings.get("max_speakers") + if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s: + return JSONResponse( + status_code=400, + content={"error": "min_speakers cannot exceed max_speakers"} + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Get current settings and merge with new values

current_settings = load_diarization_settings_from_file()

current_settings.update(settings)

# Get current settings and merge with new values

current_settings = load_diarization_settings_from_file()

current_settings.update(settings)

# Cross-field validation

min_s = current_settings.get("min_speakers")

max_s = current_settings.get("max_speakers")

if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s:

return JSONResponse(

status_code=400,

content={"error": "min_speakers cannot exceed max_speakers"}

)

🤖 Prompt for AI Agents

In backends/advanced/src/advanced_omi_backend/controllers/system_controller.py around lines 803 to 806, after merging current_settings.update(settings) validate that the resulting min_speakers and max_speakers are consistent: coerce or parse them to integers if necessary, check that both keys exist (or use defaults) and that min_speakers <= max_speakers; if the check fails, raise/return a clear client error (e.g., HTTP 400 / HTTPException) with a message indicating the constraint violation and do not proceed to save or apply the settings. Ensure validation runs immediately after the merge and before any persistence or runtime usage.

coderabbitai · 2025-09-09T11:04:42Z

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py

            # Even if file save fails, we've updated the in-memory settings
            logger.warning("Settings updated in memory but file save failed")
            return {
                "message": "Settings updated (file save failed)",
-                "settings": _diarization_settings,
+                "settings": current_settings,
                "status": "partial"
            }


🛠️ Refactor suggestion

“Partial” branch message is misleading; no in-memory state exists anymore.

Since settings are file-backed (no process-global cache), claiming “updated in memory” is inaccurate. Return an error status.

- return { - "message": "Settings updated (file save failed)", - "settings": current_settings, - "status": "partial" - } + return { + "message": "Failed to persist diarization settings to file", + "settings": current_settings, # Echo attempted values for UX + "status": "error" + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Even if file save fails, we've updated the in-memory settings

logger.warning("Settings updated in memory but file save failed")

return {

"message": "Settings updated (file save failed)",

"settings": _diarization_settings,

"settings": current_settings,

"status": "partial"

}

# Even if file save fails, we've updated the in-memory settings

logger.warning("Settings updated in memory but file save failed")

return {

"message": "Failed to persist diarization settings to file",

"settings": current_settings, # Echo attempted values for UX

"status": "error"

}

🤖 Prompt for AI Agents

In backends/advanced/src/advanced_omi_backend/controllers/system_controller.py around lines 817 to 823, the response incorrectly claims settings were "updated in memory" despite there being no process-global cache; change the returned message and status to indicate a failure instead of a partial success, e.g. log the file save error, return an error HTTP status payload (message like "Settings update failed (file save failed)" and status "error"), and avoid claiming or returning an in-memory settings object — instead return the previous persisted settings or omit "settings" from the error response as appropriate.

coderabbitai · 2025-09-09T11:04:42Z

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py

+                        form_data.add_field("transcript_data", json.dumps(transcript_data))
                        form_data.add_field("user_id", "1")  # TODO: Implement proper user mapping
-                        form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))
+                        form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))
+                        form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))

-                        # Use /v1/listen endpoint
-                        endpoint = "/v1/listen"
+                        # Use /v1/diarize-identify-match endpoint as fallback
+                        endpoint = "/v1/diarize-identify-match"



⚠️ Potential issue

Honor the user_id parameter; avoid hard-coding "1".

Currently ignores the method argument.

- form_data.add_field("transcript_data", json.dumps(transcript_data)) - form_data.add_field("user_id", "1") # TODO: Implement proper user mapping + form_data.add_field("transcript_data", json.dumps(transcript_data)) + form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

form_data.add_field("transcript_data", json.dumps(transcript_data))

form_data.add_field("user_id", "1") # TODO: Implement proper user mapping

form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))

form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))

form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))

# Use /v1/listen endpoint

endpoint = "/v1/listen"

# Use /v1/diarize-identify-match endpoint as fallback

endpoint = "/v1/diarize-identify-match"

form_data.add_field("transcript_data", json.dumps(transcript_data))

form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users

form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))

form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))

# Use /v1/diarize-identify-match endpoint as fallback

endpoint = "/v1/diarize-identify-match"

🤖 Prompt for AI Agents

In backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py around lines 89 to 96, the code hard-codes form_data.add_field("user_id", "1") and ignores the function/method user_id argument; replace the hard-coded value with the passed-in user_id (coerce to string), fallback to a sensible default only if the argument is None/empty, and remove the TODO comment; ensure any necessary validation (int or non-empty) is performed before adding the field so the endpoint receives the correct user identity.

coderabbitai · 2025-09-09T11:04:42Z

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py

+                        # Send existing transcript for diarization and speaker matching
+                        form_data.add_field("transcript_data", json.dumps(transcript_data))
                        form_data.add_field("user_id", "1")  # TODO: Implement proper user mapping
-                        form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))
+                        form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))
+
+                        # Add pyannote diarization parameters
+                        form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))
+                        form_data.add_field("collar", str(config.get("collar", 2.0)))
+                        form_data.add_field("min_duration_off", str(config.get("min_duration_off", 1.5)))
+                        if config.get("min_speakers"):
+                            form_data.add_field("min_speakers", str(config.get("min_speakers")))
+                        if config.get("max_speakers"):
+                            form_data.add_field("max_speakers", str(config.get("max_speakers")))

-                        # Use /v1/listen endpoint (now supports both providers)
-                        endpoint = "/v1/listen"
+                        # Use /v1/diarize-identify-match endpoint for backend integration
+                        endpoint = "/v1/diarize-identify-match"



⚠️ Potential issue

Same bug in pyannote branch; also good param pass-through.

- form_data.add_field("transcript_data", json.dumps(transcript_data)) - form_data.add_field("user_id", "1") # TODO: Implement proper user mapping + form_data.add_field("transcript_data", json.dumps(transcript_data)) + form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Send existing transcript for diarization and speaker matching

form_data.add_field("transcript_data", json.dumps(transcript_data))

form_data.add_field("user_id", "1") # TODO: Implement proper user mapping

form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))

form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))

# Add pyannote diarization parameters

form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))

form_data.add_field("collar", str(config.get("collar", 2.0)))

form_data.add_field("min_duration_off", str(config.get("min_duration_off", 1.5)))

if config.get("min_speakers"):

form_data.add_field("min_speakers", str(config.get("min_speakers")))

if config.get("max_speakers"):

form_data.add_field("max_speakers", str(config.get("max_speakers")))

# Use /v1/listen endpoint (now supports both providers)

endpoint = "/v1/listen"

# Use /v1/diarize-identify-match endpoint for backend integration

endpoint = "/v1/diarize-identify-match"

# Send existing transcript for diarization and speaker matching

form_data.add_field("transcript_data", json.dumps(transcript_data))

form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users

form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))

coderabbitai · 2025-09-09T11:04:42Z

extras/speaker-recognition/docker-compose.yml

+    networks:
+      default:
+        aliases:
+          - speaker-service


🛠️ Refactor suggestion

Add same network alias for CPU variant to keep Nginx upstream working in CPU mode.

Only GPU has alias speaker-service; CPU will fail DNS lookups from Nginx.

Apply:

speaker-service-cpu: <<: *base-speaker-service profiles: ["cpu"] + networks: + default: + aliases: + - speaker-service

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

networks:

default:

aliases:

- speaker-service

speaker-service-cpu:

<<: *base-speaker-service

profiles: ["cpu"]

networks:

default:

aliases:

- speaker-service

🤖 Prompt for AI Agents

In extras/speaker-recognition/docker-compose.yml around lines 47 to 50, the docker network aliases only add "speaker-service" for the GPU variant so Nginx upstream works in GPU mode but CPU mode lacks that alias and will fail DNS; update the CPU service network configuration to include the same network alias "speaker-service" (i.e., add aliases: - speaker-service under the CPU service's networks.default section) so both CPU and GPU variants publish the same network alias.

coderabbitai · 2025-09-09T11:04:42Z

services.py

+    if service_name == 'speaker-recognition' and command == 'up':
+        # Read configuration to determine how to start
+        env_file = service_path / '.env'
+        if env_file.exists():
+            env_values = dotenv_values(env_file)
+            compute_mode = env_values.get('COMPUTE_MODE', 'cpu')
+            https_enabled = env_values.get('REACT_UI_HTTPS', 'false')
+
+            if https_enabled.lower() == 'true':
+                # HTTPS mode: start with profile for all services (includes nginx)
+                if compute_mode == 'gpu':
+                    cmd.extend(['--profile', 'gpu'])
+                else:
+                    cmd.extend(['--profile', 'cpu'])
+                cmd.extend(['up', '-d'])
+            else:
+                # HTTP mode: start specific services with profile (no nginx)
+                if compute_mode == 'gpu':
+                    cmd.extend(['--profile', 'gpu'])
+                else:
+                    cmd.extend(['--profile', 'cpu'])
+                cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
+        else:
+            # Fallback: just start base service
+            cmd.extend(['up', '-d'])
+    else:


⚠️ Potential issue

Limit HTTPS mode to only the intended services; avoid starting the unprofiled base service.

up -d with a profile also brings up all services without profiles (your base speaker-service). Start explicit services instead.

- if compute_mode == 'gpu': - cmd.extend(['--profile', 'gpu']) - else: - cmd.extend(['--profile', 'cpu']) - cmd.extend(['up', '-d']) + variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu' + profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu' + cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui', 'nginx'])

🤖 Prompt for AI Agents

In services.py around lines 75-100, the HTTPS branch currently calls `up -d` with a profile which will also start the unprofiled base `speaker-service`; change this to start explicit services only. When https_enabled is true, extend the command with the profile flag as now, but replace the generic `up -d` with `up -d <service-names>` — include the profile-specific speaker-service (speaker-service-gpu or speaker-service-cpu), the web-ui, and the reverse-proxy/nginx service used for HTTPS (or any other services the profile should bring up) so the unprofiled base `speaker-service` is not started. Ensure selection is conditional on compute_mode and that service names match your compose file.

coderabbitai · 2025-09-09T11:04:42Z

services.py

+                # HTTP mode: start specific services with profile (no nginx)
+                if compute_mode == 'gpu':
+                    cmd.extend(['--profile', 'gpu'])
+                else:
+                    cmd.extend(['--profile', 'cpu'])
+                cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
+        else:
+            # Fallback: just start base service
+            cmd.extend(['up', '-d'])


🛠️ Refactor suggestion

HTTP mode: same explicit targeting for determinism.

Mirror the HTTPS change to avoid surprises and speed up startup.

- if compute_mode == 'gpu': - cmd.extend(['--profile', 'gpu']) - else: - cmd.extend(['--profile', 'cpu']) - cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui']) + variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu' + profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu' + cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui'])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# HTTP mode: start specific services with profile (no nginx)

if compute_mode == 'gpu':

cmd.extend(['--profile', 'gpu'])

else:

cmd.extend(['--profile', 'cpu'])

cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])

else:

# Fallback: just start base service

cmd.extend(['up', '-d'])

# HTTP mode: start specific services with profile (no nginx)

variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu'

profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu'

cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui'])

else:

# Fallback: just start base service

cmd.extend(['up', '-d'])

🤖 Prompt for AI Agents

In services.py around lines 91 to 99, the HTTP-mode fallback uses a generic "up -d" which can start extra services unpredictably; update the fallback to explicitly target the same service set used in the HTTP branch by extending cmd with ['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'] (and preserve any profile flags already set) so startup is deterministic and mirrors the HTTPS change.

coderabbitai · 2025-09-09T11:04:43Z

setup-requirements.txt

+rich>=13.0.0
+dotenv


⚠️ Potential issue

Fix package name: use python-dotenv (current import will fail).

services.py imports from dotenv import dotenv_values, which is provided by the python-dotenv package, not dotenv. This will break at runtime.

Apply:

rich>=13.0.0 -dotenv +python-dotenv>=1.0.1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rich>=13.0.0

dotenv

# setup-requirements.txt (lines 2–3)

rich>=13.0.0

python-dotenv>=1.0.1

🤖 Prompt for AI Agents

In setup-requirements.txt around lines 2 to 3, the package listed as "dotenv" is incorrect for the import used in services.py (from dotenv import dotenv_values); replace "dotenv" with the correct PyPI package name "python-dotenv" (optionally pin a minimum version, e.g. python-dotenv>=0.21.0) so the import resolves at runtime and CI installs the correct package.

AnkushMalaker added 2 commits September 9, 2025 09:18

speaker service networking fix

14a1907

please god please let tests pass

eb05aeb

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

AnkushMalaker changed the title ~~Speaker service networking fix~~ Refactor diarization configuration and unify transcription provider interfaces with networking fixes Sep 9, 2025

AnkushMalaker merged commit c0f951e into main Sep 11, 2025
12 checks passed

This was referenced Sep 12, 2025

K8s speaker 2 merge #91

Closed

long audio fix #95

Merged

AnkushMalaker deleted the speaker-service-networking-fix branch September 19, 2025 15:24

This was referenced Oct 20, 2025

Fix/init system #130

Closed

Fix/init system clean #131

Merged

This PR introduces ElevenLabs support #136

Closed

coderabbitai bot mentioned this pull request Nov 9, 2025

feat: add github actions and fixes to speaker-recognition #150

Merged

coderabbitai bot mentioned this pull request Jan 6, 2026

feat: Enhance service management scripts and add Strix Halo support #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor diarization configuration and unify transcription provider interfaces with networking fixes#89

Refactor diarization configuration and unify transcription provider interfaces with networking fixes#89
AnkushMalaker merged 2 commits intomainfrom
speaker-service-networking-fix

AnkushMalaker commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

AnkushMalaker commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        # Get current settings and merge with new values
-        current_settings = load_diarization_settings_from_file()
-        current_settings.update(settings)
+        # Get current settings and merge with new values
+        current_settings = load_diarization_settings_from_file()
+        current_settings.update(settings)
+        # Cross-field validation
+        min_s = current_settings.get("min_speakers")
+        max_s = current_settings.get("max_speakers")
+        if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s:
+            return JSONResponse(
+                status_code=400,
+                content={"error": "min_speakers cannot exceed max_speakers"}
+            )

Conversation

AnkushMalaker commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks (1 warning, 2 inconclusive)

Uh oh!

AnkushMalaker commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AnkushMalaker commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading