Skip to content

Refactor diarization configuration and unify transcription provider interfaces with networking fixes#89

Merged
AnkushMalaker merged 2 commits intomainfrom
speaker-service-networking-fix
Sep 11, 2025
Merged

Refactor diarization configuration and unify transcription provider interfaces with networking fixes#89
AnkushMalaker merged 2 commits intomainfrom
speaker-service-networking-fix

Conversation

@AnkushMalaker
Copy link
Collaborator

@AnkushMalaker AnkushMalaker commented Sep 9, 2025

Summary by CodeRabbit

  • New Features

    • Centralized, persistently stored diarization settings with validation and environment-aware loading/saving.
    • Optional diarization support threaded through transcription and streaming; enabled for Deepgram where available.
    • Unified speaker recognition flow to a single diarization/identification endpoint for consistent behavior.
    • Added WebSocket /v1/ proxy for speaker service, enabling Deepgram-compatible connections.
  • Refactor

    • Consolidated diarization configuration management into a dedicated module for consistency and reuse.
  • Chores

    • Improved service startup behavior based on environment (CPU/GPU, HTTP/HTTPS).
    • Added dotenv dependency and network alias for speaker service.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Walkthrough

Centralizes diarization settings into a new config module, refactors system controller to use it, updates transcription providers to support a diarize flag, adjusts transcription and speaker recognition flows to use configuration, adds WebSocket proxying and network aliasing for speaker service, introduces specialized service startup logic, and adds dotenv dependency.

Changes

Cohort / File(s) Summary
Config module (new)
backends/advanced/src/advanced_omi_backend/config.py
New centralized diarization settings: defaults, path resolution, load/save JSON with template fallback, module-level cache, logging.
System controller refactor
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
Removes inline config; imports loader/saver from config.py; get/save endpoints now file-backed with stricter validation and updated response shapes.
Transcription providers interface
backends/advanced/src/advanced_omi_backend/transcription_providers.py
Adds diarize support across providers: new/updated method signatures (batch/streaming), kwargs channel, Deepgram wires diarize, Parakeet warns/ignores.
Transcription integration
backends/advanced/src/advanced_omi_backend/transcription.py
Reads diarization_source from config; conditionally sets diarize flag for Deepgram providers; passes diarize to provider transcribe.
Speaker recognition client
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py
Switches to config import; unifies endpoint to /v1/diarize-identify-match; sends transcript_data with parameters; removes provider-specific config blocks; updates logs.
Speaker-recognition deployment
extras/speaker-recognition/nginx.conf.template, extras/speaker-recognition/docker-compose.yml
Nginx: adds /v1/ WebSocket proxy and extra headers; docker-compose: adds default network alias speaker-service for GPU service.
Orchestration/CLI
services.py, setup-requirements.txt
services.py: specialized startup for speaker-recognition based on .env (COMPUTE_MODE, REACT_UI_HTTPS); preserves other service behavior; keeps --build handling. setup-requirements: adds dotenv dependency.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client as API Caller
  participant System as SystemController
  participant Config as config.py
  participant FS as Filesystem

  rect rgba(230,240,255,0.5)
  Client->>System: GET /diarization-settings
  System->>Config: load_diarization_settings_from_file()
  Config->>FS: read JSON or copy from template
  FS-->>Config: settings or fallback
  Config-->>System: settings
  System-->>Client: {status: success, settings}
  end

  rect rgba(240,230,255,0.5)
  Client->>System: POST /diarization-settings (partial updates)
  System->>System: validate & merge
  System->>Config: save_diarization_settings_to_file(merged)
  Config->>FS: write JSON
  FS-->>Config: ok/fail
  Config-->>System: True/False
  System-->>Client: {status: success/partial, settings}
  end
Loading
sequenceDiagram
  autonumber
  participant App as Transcription (caller)
  participant Cfg as config.py
  participant Prov as TranscriptionProvider
  note over App,Cfg: Determine diarize flag
  App->>Cfg: load_diarization_settings_from_file()
  Cfg-->>App: {diarization_source}
  alt provider=Deepgram & source=deepgram
    App->>Prov: transcribe(audio, sr, diarize=true)
    Prov-->>App: transcript (with diarization if supported)
  else other cases
    App->>Prov: transcribe(audio, sr, diarize=false)
    Prov-->>App: transcript
  end
Loading
sequenceDiagram
  autonumber
  participant SRC as SpeakerRecognitionClient
  participant Cfg as config.py
  participant Svc as Speaker Service (/v1/diarize-identify-match)

  SRC->>Cfg: load_diarization_settings_from_file()
  Cfg-->>SRC: settings
  alt diarization_source = deepgram
    note right of SRC: Fallback identification over transcript segments
    SRC->>Svc: POST transcript_data, user_id, similarity_threshold, min_duration
  else diarization_source = pyannote
    SRC->>Svc: POST transcript_data, similarity_threshold, min_duration, collar, min_duration_off, [min/max_speakers]
  end
  Svc-->>SRC: diarize-identify-match result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Better init system #74 — Also modifies diarization config provisioning (templates/init), overlapping with this PR’s centralized config handling.
  • Openmemory mcp #69 — Touches advanced backend configuration pathways while adding memory subsystem features; potential intersections with service/env orchestration.

Poem

In cables and configs I happily hop,
A whiskered dev with a carrot-top.
Diarize flags thread through the night,
Providers listen—left and right.
WebSockets hum, NGINX sings,
New paths align on gentle springs.
Ship it—thump-thump—my lucky rings! 🥕🐇

Pre-merge checks (1 warning, 2 inconclusive)

❌ Failed checks (1 warning, 2 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ❓ Inconclusive The current description only mentions networking fixes for the speaker service and omits the major refactoring efforts around configuration management, controller logic, transcription modules, and service startup enhancements. As a result, it remains too generic to convey the full context of the pull request. Expand the description to include a concise summary of all significant changes—such as the new diarization config module, controller and client updates, transcription provider interface enhancements, and networking adjustments—to give reviewers a complete overview of the PR.
Title Check ❓ Inconclusive The title attempts to cover multiple substantial changes—including refactoring the diarization configuration, unifying transcription provider interfaces, and networking fixes—resulting in a long, multi-topic sentence that dilutes focus and clarity while not prioritizing a single primary change. Consider shortening the title to highlight the main intent of the PR—such as “Centralize diarization configuration and unify transcription interfaces”—and handle networking adjustments in a separate follow-up or as a subtitle to improve readability and maintain single-purpose focus.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch speaker-service-networking-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
extras/speaker-recognition/docker-compose.yml (2)

3-31: Prevent unintended startup of the base service when using profiles.

Service speaker-service (the anchor) has no profile; docker compose --profile {cpu|gpu} up -d will also start this base service, causing port conflicts and wrong routing.

Low-impact fix without restructuring to x- anchors:

 services:
   # Base Speaker Recognition Service Configuration
   speaker-service:
     &base-speaker-service
+    profiles: ["base"]  # not selected by cpu/gpu flows

Alternatively convert the base block to an x-speaker-service extension and remove the active speaker-service service.


27-31: Healthcheck should target localhost, not a network alias.

Inside the container, curl against speaker-service may resolve to another container (or fail if alias missing). Use loopback.

-      test: ["CMD", "curl", "-f", "http://speaker-service:8085/health"]
+      test: ["CMD", "curl", "-f", "http://localhost:8085/health"]
services.py (1)

163-170: Don't time out long-running compose operations.

up -d without --build can still build/pull and exceed 120s. Remove timeout for up, keep it for lightweight commands.

-            result = subprocess.run(
+            # No timeout for 'up' to allow pulls/builds; keep timeout for others
+            timeout_val = None if 'up' in cmd else 120
+            result = subprocess.run(
                 cmd,
                 cwd=service_path,
                 capture_output=True,
                 text=True,
                 check=False,
-                timeout=120  # 2 minute timeout for service status checks
+                timeout=timeout_val
             )
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (2)

175-186: Remove dependency on controllers.system_controller; use the new config module.

Refactor leftovers will break if _diarization_settings moved.

-                    from advanced_omi_backend.controllers.system_controller import _diarization_settings
+                    from advanced_omi_backend.config import load_diarization_settings_from_file as _load_cfg
+                    _diarization_settings = _load_cfg()

249-260: Same refactor for identify_speakers().

-                    from advanced_omi_backend.controllers.system_controller import _diarization_settings
+                    from advanced_omi_backend.config import load_diarization_settings_from_file as _load_cfg
+                    _diarization_settings = _load_cfg()
backends/advanced/src/advanced_omi_backend/transcription.py (1)

284-286: Double-trigger risk: memory processing is queued here and also on conversation close.

Top-of-file comment says memory is handled “exclusively by conversation closure,” yet this call re-introduces it, likely causing duplicate processing.

Remove the call:

-            # Queue memory processing now that transcription is complete
-            await self._queue_memory_processing()
+            # Memory processing is triggered by conversation closure to avoid duplicates
🧹 Nitpick comments (17)
services.py (3)

81-84: Accept common truthy values for REACT_UI_HTTPS.

Handle "1/yes/true" robustly.

-            https_enabled = env_values.get('REACT_UI_HTTPS', 'false')
+            https_enabled = env_values.get('REACT_UI_HTTPS', 'false')
+            https_enabled = str(https_enabled).strip().lower() in ('1', 'true', 'yes', 'on')

And branch on if https_enabled:.


111-112: Place --build before services to avoid CLI parsing edge cases.

Safer ordering with compose.

-    if command == 'up' and build:
-        cmd.append('--build')
+    if command == 'up' and build:
+        cmd.insert(len(cmd), '--build')  # keep before any service names if present

11-11: Remove unused import.

os isn’t used.

-import os
extras/speaker-recognition/nginx.conf.template (1)

97-113: Good WebSocket proxy for /v1; consider passing subprotocol when required.

If clients use Sec-WebSocket-Protocol (e.g., provider-specific subprotocols), forward it.

         location /v1/ {
             proxy_pass http://speaker_service/v1/;
             proxy_http_version 1.1;
             proxy_set_header Upgrade $http_upgrade;
             proxy_set_header Connection $connection_upgrade;
+            proxy_set_header Sec-WebSocket-Protocol $http_sec_websocket_protocol;
backends/advanced/src/advanced_omi_backend/config.py (2)

60-79: Narrow exceptions and log stack traces.

Use specific exceptions where possible and logger.exception for diagnostics.

-        if template_path.exists():
-            try:
+        if template_path.exists():
+            try:
                 # Ensure parent directory exists
                 config_path.parent.mkdir(parents=True, exist_ok=True)
                 shutil.copy(template_path, config_path)
                 logger.info(f"Created diarization config from template at {config_path}")
-            except Exception as e:
-                logger.warning(f"Could not copy template to {config_path}: {e}")
+            except (OSError, IOError):
+                logger.exception(f"Could not copy template to {config_path}")
-    if config_path.exists():
-        try:
+    if config_path.exists():
+        try:
             with open(config_path, 'r') as f:
                 _diarization_settings = json.load(f)
                 logger.info(f"Loaded diarization settings from {config_path}")
                 return _diarization_settings
-        except Exception as e:
-            logger.error(f"Error loading diarization settings from {config_path}: {e}")
+        except (OSError, IOError, json.JSONDecodeError):
+            logger.exception(f"Error loading diarization settings from {config_path}")

92-107: Same for save path; bubble precise errors.

Also keep stack traces.

-    try:
+    try:
         # Ensure parent directory exists
         config_path.parent.mkdir(parents=True, exist_ok=True)
         
         # Write settings to file
         with open(config_path, 'w') as f:
             json.dump(settings, f, indent=2)
         
         # Update cache
         _diarization_settings = settings
         
         logger.info(f"Saved diarization settings to {config_path}")
         return True
-    except Exception as e:
-        logger.error(f"Error saving diarization settings to {config_path}: {e}")
+    except (OSError, IOError, TypeError, ValueError):
+        logger.exception(f"Error saving diarization settings to {config_path}")
         return False
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)

136-141: Use logger.exception for full stack traces.

Improves debuggability.

-        except Exception as e:
-            logger.error(f"Error during diarize-identify-match: {e}")
+        except Exception:
+            logger.exception("Error during diarize-identify-match")
             return {}
backends/advanced/src/advanced_omi_backend/transcription.py (2)

16-16: Avoid per-call config file reads; cache diarization source (hot path).

load_diarization_settings_from_file() is invoked on every transcription, adding file I/O to a hot path. Cache with a short TTL or inject via constructor.

Apply this inline change to use a cached helper:

-            config = load_diarization_settings_from_file()
-            diarization_source = config.get("diarization_source", "pyannote")
+            diarization_source = self._get_diarization_source()

Add this helper to the class (outside the shown range):

def _get_diarization_source(self) -> str:
    try:
        ttl_s = 5.0
        now = time.time()
        if not hasattr(self, "_diar_src_cache") or (now - getattr(self, "_diar_src_cache_ts", 0)) > ttl_s:
            cfg = load_diarization_settings_from_file()
            self._diar_src_cache = cfg.get("diarization_source", "pyannote")
            self._diar_src_cache_ts = now
        return self._diar_src_cache
    except Exception:
        return "pyannote"

121-133: Diarization toggle: add explicit “disabled” log for mismatches; guard by capability.

When config requests Deepgram diarization but the active provider isn’t Deepgram, log why diarization is off to aid ops/debugging. Optionally move capability knowledge to providers.

Minimal change:

             if should_diarize:
                 logger.info(f"Requesting diarization from {self.provider.name} (diarization_source=deepgram)")
+            else:
+                logger.debug(
+                    "Diarization not requested (source=%s, provider=%s)",
+                    diarization_source, self.provider.name
+                )
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)

792-796: Normalize diarization_source to lowercase; accept canonical values only.

Prevents surprising 400s for "Deepgram"/"PYANNOTE".

-            elif key == "diarization_source":
-                if not isinstance(value, str) or value not in ["pyannote", "deepgram"]:
+            elif key == "diarization_source":
+                if not isinstance(value, str) or value.lower() not in {"pyannote", "deepgram"}:
                     return JSONResponse(
                         status_code=400, content={"error": f"Invalid value for {key}: must be 'pyannote' or 'deepgram'"}
                     )
+                # Normalize
+                settings[key] = value.lower()

808-814: Persist atomically; adjust success payload shape stays the same.

If save uses plain write, a crash can corrupt the file. Prefer atomic write (temp file + replace). If the helper already does this, ignore.

No code change requested here, just a heads-up.

backends/advanced/src/advanced_omi_backend/transcription_providers.py (6)

48-56: Base API extension is fine; avoid signature drift across hierarchies.

You now have Base.transcribe(**kwargs) and Batch.transcribe(diarize: bool). Prefer one canonical signature to avoid type checker confusion and future drift (see lines 127–136 diff below).


127-136: Unify abstract method signature with Base to prevent divergence.

Match Base’s **kwargs to keep subclass contracts consistent.

-    @abc.abstractmethod
-    async def transcribe(self, audio_data: bytes, sample_rate: int, diarize: bool = False) -> dict:
-        """Transcribe audio data.
-        
-        Args:
-            audio_data: Raw audio bytes
-            sample_rate: Audio sample rate
-            diarize: Whether to enable speaker diarization (provider-dependent)
-        """
-        pass
+    @abc.abstractmethod
+    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+        """Transcribe audio data (provider-specific kwargs like diarize may be supported)."""
+        pass

312-322: Streaming: consider WebSocket keepalive and frame size tuning.

To improve resilience on long sessions/big interim payloads:

  • Set ping_interval to keep the connection alive.
  • Increase max_size if large messages are expected.

Example:

-            websocket = await websockets.connect(
-                ws_url,
-                extra_headers={"Authorization": f"Token {self.api_key}"}
-            )
+            websocket = await websockets.connect(
+                ws_url,
+                extra_headers={"Authorization": f"Token {self.api_key}"},
+                ping_interval=20,
+                max_size=4 * 1024 * 1024,
+            )

Also applies to: 329-329


471-474: Silence ARG002 (unused args) in streaming transcribe stub.

Rename to underscores.

-    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+    async def transcribe(self, _audio_data: bytes, _sample_rate: int, **_kwargs) -> dict:

501-501: Silence ARG002 for kwargs in Parakeet batch transcribe.

-    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+    async def transcribe(self, audio_data: bytes, sample_rate: int, **_kwargs) -> dict:

748-752: Silence ARG002 (unused args) in Parakeet streaming transcribe stub.

-    async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict:
+    async def transcribe(self, _audio_data: bytes, _sample_rate: int, **_kwargs) -> dict:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c79db94 and eb05aeb.

📒 Files selected for processing (9)
  • backends/advanced/src/advanced_omi_backend/config.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (3 hunks)
  • backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/transcription.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/transcription_providers.py (9 hunks)
  • extras/speaker-recognition/docker-compose.yml (1 hunks)
  • extras/speaker-recognition/nginx.conf.template (1 hunks)
  • services.py (2 hunks)
  • setup-requirements.txt (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backends/advanced/src/advanced_omi_backend/config.py

67-67: Do not catch blind exception: Exception

(BLE001)


77-77: Do not catch blind exception: Exception

(BLE001)


78-78: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


104-104: Consider moving this statement to an else block

(TRY300)


105-105: Do not catch blind exception: Exception

(BLE001)


106-106: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/transcription_providers.py

471-471: Unused method argument: audio_data

(ARG002)


471-471: Unused method argument: sample_rate

(ARG002)


471-471: Unused method argument: kwargs

(ARG002)


501-501: Unused method argument: kwargs

(ARG002)


748-748: Unused method argument: audio_data

(ARG002)


748-748: Unused method argument: sample_rate

(ARG002)


748-748: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: speaker-recognition-tests
  • GitHub Check: integration-tests
🔇 Additional comments (8)
extras/speaker-recognition/docker-compose.yml (2)

101-102: Verify the Nginx config path: docker-compose mounts nginx.conf, but repo ships nginx.conf.template.

This likely 404s at container start unless another process materializes nginx.conf.

If the file isn’t generated, either rename the file or mount the template path:

-      - ./nginx.conf:/etc/nginx/nginx.conf:ro
+      - ./nginx.conf.template:/etc/nginx/nginx.conf:ro

87-91: Web UI healthcheck may be wrong when HTTPS is disabled.

It always probes https://localhost:${REACT_UI_PORT}. If REACT_UI_HTTPS=false, this will fail.

Consider switching protocol based on REACT_UI_HTTPS or probing the internal port the app actually listens on.

extras/speaker-recognition/nginx.conf.template (1)

48-50: Verify Web UI upstream port.

Upstream is pinned to 5175. Ensure the web-ui actually listens on that port for both HTTP/HTTPS modes; otherwise requests will 502.

If the port is driven by REACT_UI_PORT, consider templating the nginx config at container start (envsubst) to keep them in sync.

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)

66-69: Good: centralized config import avoids controller coupling.

This change reduces circular-import risk.

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)

22-22: Config module import change looks good.

backends/advanced/src/advanced_omi_backend/transcription_providers.py (3)

150-157: Deepgram diarization wiring LGTM.

The diarize flag is correctly threaded into query params and remains optional. Timeout scaling by duration is a nice touch.

Also applies to: 164-164


572-582: Clear warning on unsupported diarize is good UX.

No changes requested.


94-101: No start_stream calls found—verify external callers pass diarize
No in-repo call sites reference start_stream; ensure any consumers (e.g. orchestration or service layers) include the new diarize argument when invoking this method.

Comment on lines +803 to 806
# Get current settings and merge with new values
current_settings = load_diarization_settings_from_file()
current_settings.update(settings)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate min_speakers ≤ max_speakers after merge.

Prevents impossible configs reaching runtime.

         current_settings = load_diarization_settings_from_file()
         current_settings.update(settings)
+        # Cross-field validation
+        min_s = current_settings.get("min_speakers")
+        max_s = current_settings.get("max_speakers")
+        if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s:
+            return JSONResponse(
+                status_code=400,
+                content={"error": "min_speakers cannot exceed max_speakers"}
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get current settings and merge with new values
current_settings = load_diarization_settings_from_file()
current_settings.update(settings)
# Get current settings and merge with new values
current_settings = load_diarization_settings_from_file()
current_settings.update(settings)
# Cross-field validation
min_s = current_settings.get("min_speakers")
max_s = current_settings.get("max_speakers")
if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s:
return JSONResponse(
status_code=400,
content={"error": "min_speakers cannot exceed max_speakers"}
)
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
around lines 803 to 806, after merging current_settings.update(settings)
validate that the resulting min_speakers and max_speakers are consistent: coerce
or parse them to integers if necessary, check that both keys exist (or use
defaults) and that min_speakers <= max_speakers; if the check fails,
raise/return a clear client error (e.g., HTTP 400 / HTTPException) with a
message indicating the constraint violation and do not proceed to save or apply
the settings. Ensure validation runs immediately after the merge and before any
persistence or runtime usage.

Comment on lines 817 to 823
# Even if file save fails, we've updated the in-memory settings
logger.warning("Settings updated in memory but file save failed")
return {
"message": "Settings updated (file save failed)",
"settings": _diarization_settings,
"settings": current_settings,
"status": "partial"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

“Partial” branch message is misleading; no in-memory state exists anymore.

Since settings are file-backed (no process-global cache), claiming “updated in memory” is inaccurate. Return an error status.

-            return {
-                "message": "Settings updated (file save failed)",
-                "settings": current_settings,
-                "status": "partial"
-            }
+            return {
+                "message": "Failed to persist diarization settings to file",
+                "settings": current_settings,  # Echo attempted values for UX
+                "status": "error"
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Even if file save fails, we've updated the in-memory settings
logger.warning("Settings updated in memory but file save failed")
return {
"message": "Settings updated (file save failed)",
"settings": _diarization_settings,
"settings": current_settings,
"status": "partial"
}
# Even if file save fails, we've updated the in-memory settings
logger.warning("Settings updated in memory but file save failed")
return {
"message": "Failed to persist diarization settings to file",
"settings": current_settings, # Echo attempted values for UX
"status": "error"
}
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
around lines 817 to 823, the response incorrectly claims settings were "updated
in memory" despite there being no process-global cache; change the returned
message and status to indicate a failure instead of a partial success, e.g. log
the file save error, return an error HTTP status payload (message like "Settings
update failed (file save failed)" and status "error"), and avoid claiming or
returning an in-memory settings object — instead return the previous persisted
settings or omit "settings" from the error response as appropriate.

Comment on lines +89 to 96
form_data.add_field("transcript_data", json.dumps(transcript_data))
form_data.add_field("user_id", "1") # TODO: Implement proper user mapping
form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))

# Use /v1/listen endpoint
endpoint = "/v1/listen"
# Use /v1/diarize-identify-match endpoint as fallback
endpoint = "/v1/diarize-identify-match"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Honor the user_id parameter; avoid hard-coding "1".

Currently ignores the method argument.

-                        form_data.add_field("transcript_data", json.dumps(transcript_data))
-                        form_data.add_field("user_id", "1")  # TODO: Implement proper user mapping
+                        form_data.add_field("transcript_data", json.dumps(transcript_data))
+                        form_data.add_field("user_id", str(user_id) if user_id is not None else "1")  # TODO: map real users
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
form_data.add_field("transcript_data", json.dumps(transcript_data))
form_data.add_field("user_id", "1") # TODO: Implement proper user mapping
form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))
# Use /v1/listen endpoint
endpoint = "/v1/listen"
# Use /v1/diarize-identify-match endpoint as fallback
endpoint = "/v1/diarize-identify-match"
form_data.add_field("transcript_data", json.dumps(transcript_data))
form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users
form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))
# Use /v1/diarize-identify-match endpoint as fallback
endpoint = "/v1/diarize-identify-match"
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py
around lines 89 to 96, the code hard-codes form_data.add_field("user_id", "1")
and ignores the function/method user_id argument; replace the hard-coded value
with the passed-in user_id (coerce to string), fallback to a sensible default
only if the argument is None/empty, and remove the TODO comment; ensure any
necessary validation (int or non-empty) is performed before adding the field so
the endpoint receives the correct user identity.

Comment on lines +101 to 117
# Send existing transcript for diarization and speaker matching
form_data.add_field("transcript_data", json.dumps(transcript_data))
form_data.add_field("user_id", "1") # TODO: Implement proper user mapping
form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))

# Add pyannote diarization parameters
form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))
form_data.add_field("collar", str(config.get("collar", 2.0)))
form_data.add_field("min_duration_off", str(config.get("min_duration_off", 1.5)))
if config.get("min_speakers"):
form_data.add_field("min_speakers", str(config.get("min_speakers")))
if config.get("max_speakers"):
form_data.add_field("max_speakers", str(config.get("max_speakers")))

# Use /v1/listen endpoint (now supports both providers)
endpoint = "/v1/listen"
# Use /v1/diarize-identify-match endpoint for backend integration
endpoint = "/v1/diarize-identify-match"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same bug in pyannote branch; also good param pass-through.

-                        form_data.add_field("transcript_data", json.dumps(transcript_data))
-                        form_data.add_field("user_id", "1")  # TODO: Implement proper user mapping
+                        form_data.add_field("transcript_data", json.dumps(transcript_data))
+                        form_data.add_field("user_id", str(user_id) if user_id is not None else "1")  # TODO: map real users
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Send existing transcript for diarization and speaker matching
form_data.add_field("transcript_data", json.dumps(transcript_data))
form_data.add_field("user_id", "1") # TODO: Implement proper user mapping
form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15)))
form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))
# Add pyannote diarization parameters
form_data.add_field("min_duration", str(config.get("min_duration", 0.5)))
form_data.add_field("collar", str(config.get("collar", 2.0)))
form_data.add_field("min_duration_off", str(config.get("min_duration_off", 1.5)))
if config.get("min_speakers"):
form_data.add_field("min_speakers", str(config.get("min_speakers")))
if config.get("max_speakers"):
form_data.add_field("max_speakers", str(config.get("max_speakers")))
# Use /v1/listen endpoint (now supports both providers)
endpoint = "/v1/listen"
# Use /v1/diarize-identify-match endpoint for backend integration
endpoint = "/v1/diarize-identify-match"
# Send existing transcript for diarization and speaker matching
form_data.add_field("transcript_data", json.dumps(transcript_data))
form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users
form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15)))

Comment on lines +47 to +50
networks:
default:
aliases:
- speaker-service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add same network alias for CPU variant to keep Nginx upstream working in CPU mode.

Only GPU has alias speaker-service; CPU will fail DNS lookups from Nginx.

Apply:

   speaker-service-cpu:
     <<: *base-speaker-service
     profiles: ["cpu"]
+    networks:
+      default:
+        aliases:
+          - speaker-service
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
networks:
default:
aliases:
- speaker-service
speaker-service-cpu:
<<: *base-speaker-service
profiles: ["cpu"]
networks:
default:
aliases:
- speaker-service
🤖 Prompt for AI Agents
In extras/speaker-recognition/docker-compose.yml around lines 47 to 50, the
docker network aliases only add "speaker-service" for the GPU variant so Nginx
upstream works in GPU mode but CPU mode lacks that alias and will fail DNS;
update the CPU service network configuration to include the same network alias
"speaker-service" (i.e., add aliases: - speaker-service under the CPU service's
networks.default section) so both CPU and GPU variants publish the same network
alias.

Comment on lines +75 to +100
if service_name == 'speaker-recognition' and command == 'up':
# Read configuration to determine how to start
env_file = service_path / '.env'
if env_file.exists():
env_values = dotenv_values(env_file)
compute_mode = env_values.get('COMPUTE_MODE', 'cpu')
https_enabled = env_values.get('REACT_UI_HTTPS', 'false')

if https_enabled.lower() == 'true':
# HTTPS mode: start with profile for all services (includes nginx)
if compute_mode == 'gpu':
cmd.extend(['--profile', 'gpu'])
else:
cmd.extend(['--profile', 'cpu'])
cmd.extend(['up', '-d'])
else:
# HTTP mode: start specific services with profile (no nginx)
if compute_mode == 'gpu':
cmd.extend(['--profile', 'gpu'])
else:
cmd.extend(['--profile', 'cpu'])
cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
else:
# Fallback: just start base service
cmd.extend(['up', '-d'])
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Limit HTTPS mode to only the intended services; avoid starting the unprofiled base service.

up -d with a profile also brings up all services without profiles (your base speaker-service). Start explicit services instead.

-                if compute_mode == 'gpu':
-                    cmd.extend(['--profile', 'gpu'])
-                else:
-                    cmd.extend(['--profile', 'cpu'])
-                cmd.extend(['up', '-d'])
+                variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu'
+                profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu'
+                cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui', 'nginx'])
🤖 Prompt for AI Agents
In services.py around lines 75-100, the HTTPS branch currently calls `up -d`
with a profile which will also start the unprofiled base `speaker-service`;
change this to start explicit services only. When https_enabled is true, extend
the command with the profile flag as now, but replace the generic `up -d` with
`up -d <service-names>` — include the profile-specific speaker-service
(speaker-service-gpu or speaker-service-cpu), the web-ui, and the
reverse-proxy/nginx service used for HTTPS (or any other services the profile
should bring up) so the unprofiled base `speaker-service` is not started. Ensure
selection is conditional on compute_mode and that service names match your
compose file.

Comment on lines +91 to +99
# HTTP mode: start specific services with profile (no nginx)
if compute_mode == 'gpu':
cmd.extend(['--profile', 'gpu'])
else:
cmd.extend(['--profile', 'cpu'])
cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
else:
# Fallback: just start base service
cmd.extend(['up', '-d'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

HTTP mode: same explicit targeting for determinism.

Mirror the HTTPS change to avoid surprises and speed up startup.

-                if compute_mode == 'gpu':
-                    cmd.extend(['--profile', 'gpu'])
-                else:
-                    cmd.extend(['--profile', 'cpu'])
-                cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
+                variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu'
+                profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu'
+                cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui'])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# HTTP mode: start specific services with profile (no nginx)
if compute_mode == 'gpu':
cmd.extend(['--profile', 'gpu'])
else:
cmd.extend(['--profile', 'cpu'])
cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
else:
# Fallback: just start base service
cmd.extend(['up', '-d'])
# HTTP mode: start specific services with profile (no nginx)
variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu'
profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu'
cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui'])
else:
# Fallback: just start base service
cmd.extend(['up', '-d'])
🤖 Prompt for AI Agents
In services.py around lines 91 to 99, the HTTP-mode fallback uses a generic "up
-d" which can start extra services unpredictably; update the fallback to
explicitly target the same service set used in the HTTP branch by extending cmd
with ['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else
'speaker-service-cpu', 'web-ui'] (and preserve any profile flags already set) so
startup is deterministic and mirrors the HTTPS change.

Comment on lines +2 to +3
rich>=13.0.0
dotenv No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix package name: use python-dotenv (current import will fail).

services.py imports from dotenv import dotenv_values, which is provided by the python-dotenv package, not dotenv. This will break at runtime.

Apply:

 rich>=13.0.0
-dotenv
+python-dotenv>=1.0.1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rich>=13.0.0
dotenv
# setup-requirements.txt (lines 2–3)
rich>=13.0.0
python-dotenv>=1.0.1
🤖 Prompt for AI Agents
In setup-requirements.txt around lines 2 to 3, the package listed as "dotenv" is
incorrect for the import used in services.py (from dotenv import dotenv_values);
replace "dotenv" with the correct PyPI package name "python-dotenv" (optionally
pin a minimum version, e.g. python-dotenv>=0.21.0) so the import resolves at
runtime and CI installs the correct package.

@AnkushMalaker AnkushMalaker changed the title Speaker service networking fix Refactor diarization configuration and unify transcription provider interfaces with networking fixes Sep 9, 2025
@AnkushMalaker AnkushMalaker merged commit c0f951e into main Sep 11, 2025
12 checks passed
This was referenced Sep 12, 2025
@AnkushMalaker AnkushMalaker deleted the speaker-service-networking-fix branch September 19, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant