Refactor diarization configuration and unify transcription provider interfaces with networking fixes#89
Conversation
WalkthroughCentralizes diarization settings into a new config module, refactors system controller to use it, updates transcription providers to support a diarize flag, adjusts transcription and speaker recognition flows to use configuration, adds WebSocket proxying and network aliasing for speaker service, introduces specialized service startup logic, and adds dotenv dependency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as API Caller
participant System as SystemController
participant Config as config.py
participant FS as Filesystem
rect rgba(230,240,255,0.5)
Client->>System: GET /diarization-settings
System->>Config: load_diarization_settings_from_file()
Config->>FS: read JSON or copy from template
FS-->>Config: settings or fallback
Config-->>System: settings
System-->>Client: {status: success, settings}
end
rect rgba(240,230,255,0.5)
Client->>System: POST /diarization-settings (partial updates)
System->>System: validate & merge
System->>Config: save_diarization_settings_to_file(merged)
Config->>FS: write JSON
FS-->>Config: ok/fail
Config-->>System: True/False
System-->>Client: {status: success/partial, settings}
end
sequenceDiagram
autonumber
participant App as Transcription (caller)
participant Cfg as config.py
participant Prov as TranscriptionProvider
note over App,Cfg: Determine diarize flag
App->>Cfg: load_diarization_settings_from_file()
Cfg-->>App: {diarization_source}
alt provider=Deepgram & source=deepgram
App->>Prov: transcribe(audio, sr, diarize=true)
Prov-->>App: transcript (with diarization if supported)
else other cases
App->>Prov: transcribe(audio, sr, diarize=false)
Prov-->>App: transcript
end
sequenceDiagram
autonumber
participant SRC as SpeakerRecognitionClient
participant Cfg as config.py
participant Svc as Speaker Service (/v1/diarize-identify-match)
SRC->>Cfg: load_diarization_settings_from_file()
Cfg-->>SRC: settings
alt diarization_source = deepgram
note right of SRC: Fallback identification over transcript segments
SRC->>Svc: POST transcript_data, user_id, similarity_threshold, min_duration
else diarization_source = pyannote
SRC->>Svc: POST transcript_data, similarity_threshold, min_duration, collar, min_duration_off, [min/max_speakers]
end
Svc-->>SRC: diarize-identify-match result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks (1 warning, 2 inconclusive)❌ Failed checks (1 warning, 2 inconclusive)
✨ Finishing Touches
🧪 Generate unit tests
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
extras/speaker-recognition/docker-compose.yml (2)
3-31: Prevent unintended startup of the base service when using profiles.Service
speaker-service(the anchor) has no profile;docker compose --profile {cpu|gpu} up -dwill also start this base service, causing port conflicts and wrong routing.Low-impact fix without restructuring to
x-anchors:services: # Base Speaker Recognition Service Configuration speaker-service: &base-speaker-service + profiles: ["base"] # not selected by cpu/gpu flowsAlternatively convert the base block to an
x-speaker-serviceextension and remove the activespeaker-serviceservice.
27-31: Healthcheck should target localhost, not a network alias.Inside the container, curl against
speaker-servicemay resolve to another container (or fail if alias missing). Use loopback.- test: ["CMD", "curl", "-f", "http://speaker-service:8085/health"] + test: ["CMD", "curl", "-f", "http://localhost:8085/health"]services.py (1)
163-170: Don't time out long-running compose operations.
up -dwithout--buildcan still build/pull and exceed 120s. Remove timeout forup, keep it for lightweight commands.- result = subprocess.run( + # No timeout for 'up' to allow pulls/builds; keep timeout for others + timeout_val = None if 'up' in cmd else 120 + result = subprocess.run( cmd, cwd=service_path, capture_output=True, text=True, check=False, - timeout=120 # 2 minute timeout for service status checks + timeout=timeout_val )backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (2)
175-186: Remove dependency on controllers.system_controller; use the new config module.Refactor leftovers will break if
_diarization_settingsmoved.- from advanced_omi_backend.controllers.system_controller import _diarization_settings + from advanced_omi_backend.config import load_diarization_settings_from_file as _load_cfg + _diarization_settings = _load_cfg()
249-260: Same refactor for identify_speakers().- from advanced_omi_backend.controllers.system_controller import _diarization_settings + from advanced_omi_backend.config import load_diarization_settings_from_file as _load_cfg + _diarization_settings = _load_cfg()backends/advanced/src/advanced_omi_backend/transcription.py (1)
284-286: Double-trigger risk: memory processing is queued here and also on conversation close.Top-of-file comment says memory is handled “exclusively by conversation closure,” yet this call re-introduces it, likely causing duplicate processing.
Remove the call:
- # Queue memory processing now that transcription is complete - await self._queue_memory_processing() + # Memory processing is triggered by conversation closure to avoid duplicates
🧹 Nitpick comments (17)
services.py (3)
81-84: Accept common truthy values for REACT_UI_HTTPS.Handle "1/yes/true" robustly.
- https_enabled = env_values.get('REACT_UI_HTTPS', 'false') + https_enabled = env_values.get('REACT_UI_HTTPS', 'false') + https_enabled = str(https_enabled).strip().lower() in ('1', 'true', 'yes', 'on')And branch on
if https_enabled:.
111-112: Place --build before services to avoid CLI parsing edge cases.Safer ordering with compose.
- if command == 'up' and build: - cmd.append('--build') + if command == 'up' and build: + cmd.insert(len(cmd), '--build') # keep before any service names if present
11-11: Remove unused import.
osisn’t used.-import osextras/speaker-recognition/nginx.conf.template (1)
97-113: Good WebSocket proxy for /v1; consider passing subprotocol when required.If clients use
Sec-WebSocket-Protocol(e.g., provider-specific subprotocols), forward it.location /v1/ { proxy_pass http://speaker_service/v1/; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection $connection_upgrade; + proxy_set_header Sec-WebSocket-Protocol $http_sec_websocket_protocol;backends/advanced/src/advanced_omi_backend/config.py (2)
60-79: Narrow exceptions and log stack traces.Use specific exceptions where possible and
logger.exceptionfor diagnostics.- if template_path.exists(): - try: + if template_path.exists(): + try: # Ensure parent directory exists config_path.parent.mkdir(parents=True, exist_ok=True) shutil.copy(template_path, config_path) logger.info(f"Created diarization config from template at {config_path}") - except Exception as e: - logger.warning(f"Could not copy template to {config_path}: {e}") + except (OSError, IOError): + logger.exception(f"Could not copy template to {config_path}")- if config_path.exists(): - try: + if config_path.exists(): + try: with open(config_path, 'r') as f: _diarization_settings = json.load(f) logger.info(f"Loaded diarization settings from {config_path}") return _diarization_settings - except Exception as e: - logger.error(f"Error loading diarization settings from {config_path}: {e}") + except (OSError, IOError, json.JSONDecodeError): + logger.exception(f"Error loading diarization settings from {config_path}")
92-107: Same for save path; bubble precise errors.Also keep stack traces.
- try: + try: # Ensure parent directory exists config_path.parent.mkdir(parents=True, exist_ok=True) # Write settings to file with open(config_path, 'w') as f: json.dump(settings, f, indent=2) # Update cache _diarization_settings = settings logger.info(f"Saved diarization settings to {config_path}") return True - except Exception as e: - logger.error(f"Error saving diarization settings to {config_path}: {e}") + except (OSError, IOError, TypeError, ValueError): + logger.exception(f"Error saving diarization settings to {config_path}") return Falsebackends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)
136-141: Use logger.exception for full stack traces.Improves debuggability.
- except Exception as e: - logger.error(f"Error during diarize-identify-match: {e}") + except Exception: + logger.exception("Error during diarize-identify-match") return {}backends/advanced/src/advanced_omi_backend/transcription.py (2)
16-16: Avoid per-call config file reads; cache diarization source (hot path).load_diarization_settings_from_file() is invoked on every transcription, adding file I/O to a hot path. Cache with a short TTL or inject via constructor.
Apply this inline change to use a cached helper:
- config = load_diarization_settings_from_file() - diarization_source = config.get("diarization_source", "pyannote") + diarization_source = self._get_diarization_source()Add this helper to the class (outside the shown range):
def _get_diarization_source(self) -> str: try: ttl_s = 5.0 now = time.time() if not hasattr(self, "_diar_src_cache") or (now - getattr(self, "_diar_src_cache_ts", 0)) > ttl_s: cfg = load_diarization_settings_from_file() self._diar_src_cache = cfg.get("diarization_source", "pyannote") self._diar_src_cache_ts = now return self._diar_src_cache except Exception: return "pyannote"
121-133: Diarization toggle: add explicit “disabled” log for mismatches; guard by capability.When config requests Deepgram diarization but the active provider isn’t Deepgram, log why diarization is off to aid ops/debugging. Optionally move capability knowledge to providers.
Minimal change:
if should_diarize: logger.info(f"Requesting diarization from {self.provider.name} (diarization_source=deepgram)") + else: + logger.debug( + "Diarization not requested (source=%s, provider=%s)", + diarization_source, self.provider.name + )backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)
792-796: Normalize diarization_source to lowercase; accept canonical values only.Prevents surprising 400s for "Deepgram"/"PYANNOTE".
- elif key == "diarization_source": - if not isinstance(value, str) or value not in ["pyannote", "deepgram"]: + elif key == "diarization_source": + if not isinstance(value, str) or value.lower() not in {"pyannote", "deepgram"}: return JSONResponse( status_code=400, content={"error": f"Invalid value for {key}: must be 'pyannote' or 'deepgram'"} ) + # Normalize + settings[key] = value.lower()
808-814: Persist atomically; adjust success payload shape stays the same.If save uses plain write, a crash can corrupt the file. Prefer atomic write (temp file + replace). If the helper already does this, ignore.
No code change requested here, just a heads-up.
backends/advanced/src/advanced_omi_backend/transcription_providers.py (6)
48-56: Base API extension is fine; avoid signature drift across hierarchies.You now have Base.transcribe(**kwargs) and Batch.transcribe(diarize: bool). Prefer one canonical signature to avoid type checker confusion and future drift (see lines 127–136 diff below).
127-136: Unify abstract method signature with Base to prevent divergence.Match Base’s **kwargs to keep subclass contracts consistent.
- @abc.abstractmethod - async def transcribe(self, audio_data: bytes, sample_rate: int, diarize: bool = False) -> dict: - """Transcribe audio data. - - Args: - audio_data: Raw audio bytes - sample_rate: Audio sample rate - diarize: Whether to enable speaker diarization (provider-dependent) - """ - pass + @abc.abstractmethod + async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict: + """Transcribe audio data (provider-specific kwargs like diarize may be supported).""" + pass
312-322: Streaming: consider WebSocket keepalive and frame size tuning.To improve resilience on long sessions/big interim payloads:
- Set ping_interval to keep the connection alive.
- Increase max_size if large messages are expected.
Example:
- websocket = await websockets.connect( - ws_url, - extra_headers={"Authorization": f"Token {self.api_key}"} - ) + websocket = await websockets.connect( + ws_url, + extra_headers={"Authorization": f"Token {self.api_key}"}, + ping_interval=20, + max_size=4 * 1024 * 1024, + )Also applies to: 329-329
471-474: Silence ARG002 (unused args) in streaming transcribe stub.Rename to underscores.
- async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict: + async def transcribe(self, _audio_data: bytes, _sample_rate: int, **_kwargs) -> dict:
501-501: Silence ARG002 for kwargs in Parakeet batch transcribe.- async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict: + async def transcribe(self, audio_data: bytes, sample_rate: int, **_kwargs) -> dict:
748-752: Silence ARG002 (unused args) in Parakeet streaming transcribe stub.- async def transcribe(self, audio_data: bytes, sample_rate: int, **kwargs) -> dict: + async def transcribe(self, _audio_data: bytes, _sample_rate: int, **_kwargs) -> dict:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backends/advanced/src/advanced_omi_backend/config.py(1 hunks)backends/advanced/src/advanced_omi_backend/controllers/system_controller.py(3 hunks)backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py(2 hunks)backends/advanced/src/advanced_omi_backend/transcription.py(2 hunks)backends/advanced/src/advanced_omi_backend/transcription_providers.py(9 hunks)extras/speaker-recognition/docker-compose.yml(1 hunks)extras/speaker-recognition/nginx.conf.template(1 hunks)services.py(2 hunks)setup-requirements.txt(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
backends/advanced/src/advanced_omi_backend/config.py
67-67: Do not catch blind exception: Exception
(BLE001)
77-77: Do not catch blind exception: Exception
(BLE001)
78-78: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
104-104: Consider moving this statement to an else block
(TRY300)
105-105: Do not catch blind exception: Exception
(BLE001)
106-106: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backends/advanced/src/advanced_omi_backend/transcription_providers.py
471-471: Unused method argument: audio_data
(ARG002)
471-471: Unused method argument: sample_rate
(ARG002)
471-471: Unused method argument: kwargs
(ARG002)
501-501: Unused method argument: kwargs
(ARG002)
748-748: Unused method argument: audio_data
(ARG002)
748-748: Unused method argument: sample_rate
(ARG002)
748-748: Unused method argument: kwargs
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: speaker-recognition-tests
- GitHub Check: integration-tests
🔇 Additional comments (8)
extras/speaker-recognition/docker-compose.yml (2)
101-102: Verify the Nginx config path: docker-compose mounts nginx.conf, but repo ships nginx.conf.template.This likely 404s at container start unless another process materializes nginx.conf.
If the file isn’t generated, either rename the file or mount the template path:
- - ./nginx.conf:/etc/nginx/nginx.conf:ro + - ./nginx.conf.template:/etc/nginx/nginx.conf:ro
87-91: Web UI healthcheck may be wrong when HTTPS is disabled.It always probes
https://localhost:${REACT_UI_PORT}. If REACT_UI_HTTPS=false, this will fail.Consider switching protocol based on
REACT_UI_HTTPSor probing the internal port the app actually listens on.extras/speaker-recognition/nginx.conf.template (1)
48-50: Verify Web UI upstream port.Upstream is pinned to
5175. Ensure the web-ui actually listens on that port for both HTTP/HTTPS modes; otherwise requests will 502.If the port is driven by
REACT_UI_PORT, consider templating the nginx config at container start (envsubst) to keep them in sync.backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)
66-69: Good: centralized config import avoids controller coupling.This change reduces circular-import risk.
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)
22-22: Config module import change looks good.backends/advanced/src/advanced_omi_backend/transcription_providers.py (3)
150-157: Deepgram diarization wiring LGTM.The diarize flag is correctly threaded into query params and remains optional. Timeout scaling by duration is a nice touch.
Also applies to: 164-164
572-582: Clear warning on unsupported diarize is good UX.No changes requested.
94-101: Nostart_streamcalls found—verify external callers passdiarize
No in-repo call sites referencestart_stream; ensure any consumers (e.g. orchestration or service layers) include the newdiarizeargument when invoking this method.
| # Get current settings and merge with new values | ||
| current_settings = load_diarization_settings_from_file() | ||
| current_settings.update(settings) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Validate min_speakers ≤ max_speakers after merge.
Prevents impossible configs reaching runtime.
current_settings = load_diarization_settings_from_file()
current_settings.update(settings)
+ # Cross-field validation
+ min_s = current_settings.get("min_speakers")
+ max_s = current_settings.get("max_speakers")
+ if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s:
+ return JSONResponse(
+ status_code=400,
+ content={"error": "min_speakers cannot exceed max_speakers"}
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Get current settings and merge with new values | |
| current_settings = load_diarization_settings_from_file() | |
| current_settings.update(settings) | |
| # Get current settings and merge with new values | |
| current_settings = load_diarization_settings_from_file() | |
| current_settings.update(settings) | |
| # Cross-field validation | |
| min_s = current_settings.get("min_speakers") | |
| max_s = current_settings.get("max_speakers") | |
| if isinstance(min_s, int) and isinstance(max_s, int) and min_s > max_s: | |
| return JSONResponse( | |
| status_code=400, | |
| content={"error": "min_speakers cannot exceed max_speakers"} | |
| ) |
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
around lines 803 to 806, after merging current_settings.update(settings)
validate that the resulting min_speakers and max_speakers are consistent: coerce
or parse them to integers if necessary, check that both keys exist (or use
defaults) and that min_speakers <= max_speakers; if the check fails,
raise/return a clear client error (e.g., HTTP 400 / HTTPException) with a
message indicating the constraint violation and do not proceed to save or apply
the settings. Ensure validation runs immediately after the merge and before any
persistence or runtime usage.
| # Even if file save fails, we've updated the in-memory settings | ||
| logger.warning("Settings updated in memory but file save failed") | ||
| return { | ||
| "message": "Settings updated (file save failed)", | ||
| "settings": _diarization_settings, | ||
| "settings": current_settings, | ||
| "status": "partial" | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
“Partial” branch message is misleading; no in-memory state exists anymore.
Since settings are file-backed (no process-global cache), claiming “updated in memory” is inaccurate. Return an error status.
- return {
- "message": "Settings updated (file save failed)",
- "settings": current_settings,
- "status": "partial"
- }
+ return {
+ "message": "Failed to persist diarization settings to file",
+ "settings": current_settings, # Echo attempted values for UX
+ "status": "error"
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Even if file save fails, we've updated the in-memory settings | |
| logger.warning("Settings updated in memory but file save failed") | |
| return { | |
| "message": "Settings updated (file save failed)", | |
| "settings": _diarization_settings, | |
| "settings": current_settings, | |
| "status": "partial" | |
| } | |
| # Even if file save fails, we've updated the in-memory settings | |
| logger.warning("Settings updated in memory but file save failed") | |
| return { | |
| "message": "Failed to persist diarization settings to file", | |
| "settings": current_settings, # Echo attempted values for UX | |
| "status": "error" | |
| } |
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
around lines 817 to 823, the response incorrectly claims settings were "updated
in memory" despite there being no process-global cache; change the returned
message and status to indicate a failure instead of a partial success, e.g. log
the file save error, return an error HTTP status payload (message like "Settings
update failed (file save failed)" and status "error"), and avoid claiming or
returning an in-memory settings object — instead return the previous persisted
settings or omit "settings" from the error response as appropriate.
| form_data.add_field("transcript_data", json.dumps(transcript_data)) | ||
| form_data.add_field("user_id", "1") # TODO: Implement proper user mapping | ||
| form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15))) | ||
| form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15))) | ||
| form_data.add_field("min_duration", str(config.get("min_duration", 0.5))) | ||
|
|
||
| # Use /v1/listen endpoint | ||
| endpoint = "/v1/listen" | ||
| # Use /v1/diarize-identify-match endpoint as fallback | ||
| endpoint = "/v1/diarize-identify-match" | ||
|
|
There was a problem hiding this comment.
Honor the user_id parameter; avoid hard-coding "1".
Currently ignores the method argument.
- form_data.add_field("transcript_data", json.dumps(transcript_data))
- form_data.add_field("user_id", "1") # TODO: Implement proper user mapping
+ form_data.add_field("transcript_data", json.dumps(transcript_data))
+ form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| form_data.add_field("transcript_data", json.dumps(transcript_data)) | |
| form_data.add_field("user_id", "1") # TODO: Implement proper user mapping | |
| form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15))) | |
| form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15))) | |
| form_data.add_field("min_duration", str(config.get("min_duration", 0.5))) | |
| # Use /v1/listen endpoint | |
| endpoint = "/v1/listen" | |
| # Use /v1/diarize-identify-match endpoint as fallback | |
| endpoint = "/v1/diarize-identify-match" | |
| form_data.add_field("transcript_data", json.dumps(transcript_data)) | |
| form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users | |
| form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15))) | |
| form_data.add_field("min_duration", str(config.get("min_duration", 0.5))) | |
| # Use /v1/diarize-identify-match endpoint as fallback | |
| endpoint = "/v1/diarize-identify-match" |
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py
around lines 89 to 96, the code hard-codes form_data.add_field("user_id", "1")
and ignores the function/method user_id argument; replace the hard-coded value
with the passed-in user_id (coerce to string), fallback to a sensible default
only if the argument is None/empty, and remove the TODO comment; ensure any
necessary validation (int or non-empty) is performed before adding the field so
the endpoint receives the correct user identity.
| # Send existing transcript for diarization and speaker matching | ||
| form_data.add_field("transcript_data", json.dumps(transcript_data)) | ||
| form_data.add_field("user_id", "1") # TODO: Implement proper user mapping | ||
| form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15))) | ||
| form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15))) | ||
|
|
||
| # Add pyannote diarization parameters | ||
| form_data.add_field("min_duration", str(config.get("min_duration", 0.5))) | ||
| form_data.add_field("collar", str(config.get("collar", 2.0))) | ||
| form_data.add_field("min_duration_off", str(config.get("min_duration_off", 1.5))) | ||
| if config.get("min_speakers"): | ||
| form_data.add_field("min_speakers", str(config.get("min_speakers"))) | ||
| if config.get("max_speakers"): | ||
| form_data.add_field("max_speakers", str(config.get("max_speakers"))) | ||
|
|
||
| # Use /v1/listen endpoint (now supports both providers) | ||
| endpoint = "/v1/listen" | ||
| # Use /v1/diarize-identify-match endpoint for backend integration | ||
| endpoint = "/v1/diarize-identify-match" | ||
|
|
There was a problem hiding this comment.
Same bug in pyannote branch; also good param pass-through.
- form_data.add_field("transcript_data", json.dumps(transcript_data))
- form_data.add_field("user_id", "1") # TODO: Implement proper user mapping
+ form_data.add_field("transcript_data", json.dumps(transcript_data))
+ form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Send existing transcript for diarization and speaker matching | |
| form_data.add_field("transcript_data", json.dumps(transcript_data)) | |
| form_data.add_field("user_id", "1") # TODO: Implement proper user mapping | |
| form_data.add_field("speaker_confidence_threshold", str(config.get("similarity_threshold", 0.15))) | |
| form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15))) | |
| # Add pyannote diarization parameters | |
| form_data.add_field("min_duration", str(config.get("min_duration", 0.5))) | |
| form_data.add_field("collar", str(config.get("collar", 2.0))) | |
| form_data.add_field("min_duration_off", str(config.get("min_duration_off", 1.5))) | |
| if config.get("min_speakers"): | |
| form_data.add_field("min_speakers", str(config.get("min_speakers"))) | |
| if config.get("max_speakers"): | |
| form_data.add_field("max_speakers", str(config.get("max_speakers"))) | |
| # Use /v1/listen endpoint (now supports both providers) | |
| endpoint = "/v1/listen" | |
| # Use /v1/diarize-identify-match endpoint for backend integration | |
| endpoint = "/v1/diarize-identify-match" | |
| # Send existing transcript for diarization and speaker matching | |
| form_data.add_field("transcript_data", json.dumps(transcript_data)) | |
| form_data.add_field("user_id", str(user_id) if user_id is not None else "1") # TODO: map real users | |
| form_data.add_field("similarity_threshold", str(config.get("similarity_threshold", 0.15))) |
| networks: | ||
| default: | ||
| aliases: | ||
| - speaker-service |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add same network alias for CPU variant to keep Nginx upstream working in CPU mode.
Only GPU has alias speaker-service; CPU will fail DNS lookups from Nginx.
Apply:
speaker-service-cpu:
<<: *base-speaker-service
profiles: ["cpu"]
+ networks:
+ default:
+ aliases:
+ - speaker-service📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| networks: | |
| default: | |
| aliases: | |
| - speaker-service | |
| speaker-service-cpu: | |
| <<: *base-speaker-service | |
| profiles: ["cpu"] | |
| networks: | |
| default: | |
| aliases: | |
| - speaker-service |
🤖 Prompt for AI Agents
In extras/speaker-recognition/docker-compose.yml around lines 47 to 50, the
docker network aliases only add "speaker-service" for the GPU variant so Nginx
upstream works in GPU mode but CPU mode lacks that alias and will fail DNS;
update the CPU service network configuration to include the same network alias
"speaker-service" (i.e., add aliases: - speaker-service under the CPU service's
networks.default section) so both CPU and GPU variants publish the same network
alias.
| if service_name == 'speaker-recognition' and command == 'up': | ||
| # Read configuration to determine how to start | ||
| env_file = service_path / '.env' | ||
| if env_file.exists(): | ||
| env_values = dotenv_values(env_file) | ||
| compute_mode = env_values.get('COMPUTE_MODE', 'cpu') | ||
| https_enabled = env_values.get('REACT_UI_HTTPS', 'false') | ||
|
|
||
| if https_enabled.lower() == 'true': | ||
| # HTTPS mode: start with profile for all services (includes nginx) | ||
| if compute_mode == 'gpu': | ||
| cmd.extend(['--profile', 'gpu']) | ||
| else: | ||
| cmd.extend(['--profile', 'cpu']) | ||
| cmd.extend(['up', '-d']) | ||
| else: | ||
| # HTTP mode: start specific services with profile (no nginx) | ||
| if compute_mode == 'gpu': | ||
| cmd.extend(['--profile', 'gpu']) | ||
| else: | ||
| cmd.extend(['--profile', 'cpu']) | ||
| cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui']) | ||
| else: | ||
| # Fallback: just start base service | ||
| cmd.extend(['up', '-d']) | ||
| else: |
There was a problem hiding this comment.
Limit HTTPS mode to only the intended services; avoid starting the unprofiled base service.
up -d with a profile also brings up all services without profiles (your base speaker-service). Start explicit services instead.
- if compute_mode == 'gpu':
- cmd.extend(['--profile', 'gpu'])
- else:
- cmd.extend(['--profile', 'cpu'])
- cmd.extend(['up', '-d'])
+ variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu'
+ profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu'
+ cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui', 'nginx'])🤖 Prompt for AI Agents
In services.py around lines 75-100, the HTTPS branch currently calls `up -d`
with a profile which will also start the unprofiled base `speaker-service`;
change this to start explicit services only. When https_enabled is true, extend
the command with the profile flag as now, but replace the generic `up -d` with
`up -d <service-names>` — include the profile-specific speaker-service
(speaker-service-gpu or speaker-service-cpu), the web-ui, and the
reverse-proxy/nginx service used for HTTPS (or any other services the profile
should bring up) so the unprofiled base `speaker-service` is not started. Ensure
selection is conditional on compute_mode and that service names match your
compose file.
| # HTTP mode: start specific services with profile (no nginx) | ||
| if compute_mode == 'gpu': | ||
| cmd.extend(['--profile', 'gpu']) | ||
| else: | ||
| cmd.extend(['--profile', 'cpu']) | ||
| cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui']) | ||
| else: | ||
| # Fallback: just start base service | ||
| cmd.extend(['up', '-d']) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
HTTP mode: same explicit targeting for determinism.
Mirror the HTTPS change to avoid surprises and speed up startup.
- if compute_mode == 'gpu':
- cmd.extend(['--profile', 'gpu'])
- else:
- cmd.extend(['--profile', 'cpu'])
- cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui'])
+ variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu'
+ profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu'
+ cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui'])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # HTTP mode: start specific services with profile (no nginx) | |
| if compute_mode == 'gpu': | |
| cmd.extend(['--profile', 'gpu']) | |
| else: | |
| cmd.extend(['--profile', 'cpu']) | |
| cmd.extend(['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else 'speaker-service-cpu', 'web-ui']) | |
| else: | |
| # Fallback: just start base service | |
| cmd.extend(['up', '-d']) | |
| # HTTP mode: start specific services with profile (no nginx) | |
| variant = 'speaker-service-gpu' if compute_mode.lower() == 'gpu' else 'speaker-service-cpu' | |
| profile = 'gpu' if compute_mode.lower() == 'gpu' else 'cpu' | |
| cmd.extend(['--profile', profile, 'up', '-d', variant, 'web-ui']) | |
| else: | |
| # Fallback: just start base service | |
| cmd.extend(['up', '-d']) |
🤖 Prompt for AI Agents
In services.py around lines 91 to 99, the HTTP-mode fallback uses a generic "up
-d" which can start extra services unpredictably; update the fallback to
explicitly target the same service set used in the HTTP branch by extending cmd
with ['up', '-d', 'speaker-service-gpu' if compute_mode == 'gpu' else
'speaker-service-cpu', 'web-ui'] (and preserve any profile flags already set) so
startup is deterministic and mirrors the HTTPS change.
| rich>=13.0.0 | ||
| dotenv No newline at end of file |
There was a problem hiding this comment.
Fix package name: use python-dotenv (current import will fail).
services.py imports from dotenv import dotenv_values, which is provided by the python-dotenv package, not dotenv. This will break at runtime.
Apply:
rich>=13.0.0
-dotenv
+python-dotenv>=1.0.1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rich>=13.0.0 | |
| dotenv | |
| # setup-requirements.txt (lines 2–3) | |
| rich>=13.0.0 | |
| python-dotenv>=1.0.1 |
🤖 Prompt for AI Agents
In setup-requirements.txt around lines 2 to 3, the package listed as "dotenv" is
incorrect for the import used in services.py (from dotenv import dotenv_values);
replace "dotenv" with the correct PyPI package name "python-dotenv" (optionally
pin a minimum version, e.g. python-dotenv>=0.21.0) so the import resolves at
runtime and CI installs the correct package.
Summary by CodeRabbit
New Features
Refactor
Chores