Skip to content

fix: auto-restart telegram polling loop on failure#373

Merged
SXP-Simon merged 2 commits intofix/tgfrom
codex/investigate-telegram-adapter-message-loss
Mar 19, 2026
Merged

fix: auto-restart telegram polling loop on failure#373
SXP-Simon merged 2 commits intofix/tgfrom
codex/investigate-telegram-adapter-message-loss

Conversation

@SXP-Simon
Copy link
Copy Markdown
Owner

@SXP-Simon SXP-Simon commented Mar 19, 2026

Motivation

  • Observed that Telegram polling can exit on transient network/API/send failures and the adapter task would not recover, causing no new inbound messages to be processed.
  • The change aims to provide a minimal, targeted self-healing mechanism so the adapter recovers without operator intervention.

Description

  • Wrap the polling lifecycle in a resilient loop in TelegramPlatformAdapter.run() that restarts start_polling() after unexpected exits and waits telegram_polling_restart_delay seconds before retrying.
  • Add an error_callback (_on_polling_error) to surface polling request errors into logs for easier diagnosis of proxy/network/API failures.
  • Introduce an instance flag _terminating to avoid restart attempts during normal shutdown and set it in terminate() to stop the retry loop.
  • Make the restart delay configurable via telegram_polling_restart_delay (default 5.0), and keep existing behavior for command registration and media-group handling unchanged.

Testing

  • Ran ruff format astrbot/core/platform/sources/telegram/tg_adapter.py which completed successfully.
  • Ran ruff check astrbot/core/platform/sources/telegram/tg_adapter.py which passed with no issues.
  • Compiled the modified module with python -m py_compile astrbot/core/platform/sources/telegram/tg_adapter.py which succeeded with no syntax errors.

Codex Task

Summary by Sourcery

Add self-healing behavior to the Telegram platform adapter polling loop so it automatically restarts after unexpected failures while allowing clean shutdowns.

Bug Fixes:

  • Prevent Telegram polling from stopping permanently after transient errors by restarting the polling loop with a delay when it exits unexpectedly.

Enhancements:

  • Log polling request errors via a dedicated error callback for easier diagnosis of Telegram network or API issues.
  • Make the Telegram polling restart delay configurable via the telegram_polling_restart_delay setting with a sensible default.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Mar 19, 2026

Reviewer's Guide

Wraps the Telegram adapter polling in a restartable loop with configurable delay, adds explicit error logging for polling failures, and introduces a termination flag to avoid restarts during normal shutdown.

Sequence diagram for the Telegram polling restart loop with error handling

sequenceDiagram
    participant Adapter as TelegramPlatformAdapter
    participant Updater as TelegramApplicationUpdater

    loop polling_restart_loop while not _terminating
        Adapter->>Updater: start_polling(error_callback=_on_polling_error)
        note right of Updater: Returns queue (awaitable polling loop)
        Adapter-->>Updater: await queue

        alt polling_request_error
            Updater-->>Adapter: _on_polling_error(error)
            Adapter->>Adapter: log error
        end

        alt polling_loop_exits_unexpectedly and not _terminating
            Adapter->>Adapter: log warning about unexpected exit
            Adapter->>Adapter: asyncio.sleep(_polling_restart_delay)
        end

        alt polling_crashes_with_exception and not _termininating
            Adapter->>Adapter: log crash and retry delay
            Adapter->>Adapter: asyncio.sleep(_polling_restart_delay)
        end
    end

    rect rgb(230,230,230)
        note over Adapter: terminate() sets _terminating = True
    end
Loading

Updated class diagram for TelegramPlatformAdapter polling lifecycle

classDiagram
    class TelegramPlatformAdapter {
        - AsyncIOScheduler scheduler
        - bool _terminating
        - float _polling_restart_delay
        + __init__(config)
        + run() async
        - _on_polling_error(error: Exception) void
        + register_commands() async
        + get_client() ExtBot
        + terminate() async
    }
Loading

File-Level Changes

Change Details Files
Add restartable polling loop with configurable backoff delay to recover from unexpected polling exits.
  • Introduce _polling_restart_delay instance attribute sourced from config key telegram_polling_restart_delay with default 5.0 seconds.
  • Replace single-shot start_polling/await queue with a while-not-terminating loop that restarts polling after unexpected exits.
  • On normal loop exit without termination, log a warning indicating unexpected polling exit and upcoming retry delay.
  • On generic exceptions in the polling loop, log the exception type and message and schedule a retry after the delay.
  • Preserve asyncio.CancelledError semantics by re-raising it without wrapping.
astrbot/core/platform/sources/telegram/tg_adapter.py
Add explicit polling error callback and improve shutdown semantics to prevent unwanted restarts during termination.
  • Add _on_polling_error method used as error_callback for updater.start_polling to log polling request failures with exception details.
  • Pass error_callback=self._on_polling_error into application.updater.start_polling.
  • Introduce _terminating instance flag initialized to False in init to control the polling restart loop.
  • Set _terminating = True at the start of terminate() so the run() loop stops retrying and exits cleanly after shutdown.
  • Ensure scheduler shutdown is still performed during terminate() when the scheduler is running.
astrbot/core/platform/sources/telegram/tg_adapter.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • When logging unexpected polling exits and crashes, consider including exc_info=True (or passing the exception as logger.exception(...)) so that stack traces are captured for easier diagnosis of persistent failures.
  • Casting telegram_polling_restart_delay directly to float may raise ValueError if misconfigured; consider validating or safely defaulting (e.g., catching the error and falling back to the default delay) to avoid the adapter failing at startup due to a bad config value.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- When logging unexpected polling exits and crashes, consider including `exc_info=True` (or passing the exception as `logger.exception(...)`) so that stack traces are captured for easier diagnosis of persistent failures.
- Casting `telegram_polling_restart_delay` directly to `float` may raise `ValueError` if misconfigured; consider validating or safely defaulting (e.g., catching the error and falling back to the default delay) to avoid the adapter failing at startup due to a bad config value.

## Individual Comments

### Comment 1
<location path="astrbot/core/platform/sources/telegram/tg_adapter.py" line_range="95-99" />
<code_context>

         self.scheduler = AsyncIOScheduler()
+        self._terminating = False
+        self._polling_restart_delay = float(
+            self.config.get("telegram_polling_restart_delay", 5.0)
+        )

</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider validating the polling restart delay from config to avoid zero/negative or non-numeric values.

Casting directly to `float` assumes the config value is valid and positive. Please guard against missing/invalid values (`ValueError`/`TypeError`) and enforce a sensible minimum (e.g. `max(0.1, float(...))`) so misconfiguration can’t cause a tight loop or startup crash.

```suggestion
        self.scheduler = AsyncIOScheduler()
        self._terminating = False

        raw_delay = self.config.get("telegram_polling_restart_delay", 5.0)
        try:
            delay = float(raw_delay)
        except (TypeError, ValueError):
            logger.warning(
                "Invalid 'telegram_polling_restart_delay' value %r in config, "
                "falling back to default 5.0s",
                raw_delay,
            )
            delay = 5.0

        if delay < 0.1:
            logger.warning(
                "Configured 'telegram_polling_restart_delay' (%s) is too small; "
                "enforcing minimum of 0.1s to avoid tight restart loops",
                delay,
            )
            delay = 0.1

        self._polling_restart_delay = delay
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/platform/sources/telegram/tg_adapter.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27b6499248

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

error_callback=self._on_polling_error
)
logger.info("Telegram Platform Adapter is running.")
await queue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat start_polling result as a lifecycle signal

Updater.start_polling in the pinned python-telegram-bot series is an async method that returns an asyncio.Queue once polling has started, so await queue here does not wait for polling to stop. With the new while loop, a healthy startup immediately falls through to the retry path, logs an unexpected-exit warning, then re-enters start_polling and repeatedly hits "already running" errors every retry interval. This creates a persistent error loop and prevents the restart logic from reflecting real polling failures.

Useful? React with 👍 / 👎.

@SXP-Simon
Copy link
Copy Markdown
Owner Author

@codex address the above comments

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

  • Addressed config hardening for Telegram polling restart delay by safely parsing telegram_polling_restart_delay, warning on invalid values, and enforcing a minimum of 0.1s to prevent tight restart loops or startup failures on misconfiguration.
  • Fixed polling lifecycle handling by awaiting updater.start_polling(...) directly (instead of treating its return value as a lifecycle awaitable), then monitoring updater.running until stop/termination before deciding whether to retry.
  • Improved failure diagnostics by switching crash logging to logger.exception(...) (stack trace included) and attaching traceback info in _on_polling_error logs.
  • Committed the changes with a conventional commit message: fix: harden telegram polling restart lifecycle (commit 3900988), and created a follow-up PR via the make_pr tool titled fix: address telegram polling review feedback.

Testing

  • ruff format .
  • ruff check .
  • python -m py_compile astrbot/core/platform/sources/telegram/tg_adapter.py

View task →

@SXP-Simon SXP-Simon merged commit 79b02a4 into fix/tg Mar 19, 2026
3 of 4 checks passed
pull bot pushed a commit that referenced this pull request Mar 20, 2026
* fix: auto-restart telegram polling loop on failure (#373)

* fix: auto-restart telegram polling loop on failure

* fix: harden telegram polling restart lifecycle

* fix(telegram): 根据建议优化轮询鲁棒性并处理 Token 失效错误

* fix: 补全配置元数据及 i18n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant