Skip to content

fix: timers wake immediately on actor shutdown in threads module#142

Merged
ElFantasma merged 13 commits intomainfrom
feature/threads-timer-actor-shutdown
Feb 3, 2026
Merged

fix: timers wake immediately on actor shutdown in threads module#142
ElFantasma merged 13 commits intomainfrom
feature/threads-timer-actor-shutdown

Conversation

@ElFantasma
Copy link
Collaborator

@ElFantasma ElFantasma commented Jan 26, 2026

EDIT (28-01-2026):

Ran Devin review and fixed all found issues:
https://app.devin.ai/review/lambdaclass/spawned/pull/142

Summary

  • Add on_cancel() callback to CancellationToken for efficient waiting
  • Fix is_cancelled() bug that was clearing the flag
  • Change CancellationToken methods from &mut self to &self
  • Update send_after/send_interval to use channel-based wake pattern
  • Add test_send_after_actor_shutdown test

Closes #123

@ElFantasma
Copy link
Collaborator Author

@claude @chatgpt @Kimi please review

@ElFantasma
Copy link
Collaborator Author

@Kimi please review

@ElFantasma
Copy link
Collaborator Author

/claude /kimi /codex

@claude
Copy link

claude bot commented Jan 30, 2026

PR Review - Findings

CRITICAL: Race Condition in CancellationToken

Severity: High
Location: rt/src/threads/mod.rs:91-103

The on_cancel() method has a race condition that can cause callbacks to be lost or fired multiple times:

pub fn on_cancel(&self, callback: CancelCallback) {
    let mut callbacks = self.callbacks.lock().unwrap_or_else(|e| e.into_inner());
    if self.is_cancelled() {  // ← Race window here\!
        drop(callbacks);
        callback();
    } else {
        callbacks.push(callback);
    }
}

The Problem:

  1. Thread A: Acquires lock, sees is_cancelled() = false, about to push callback
  2. Thread B: Calls cancel(), sets flag to true, acquires lock, drains callbacks (empty list)
  3. Thread A: Pushes callback to the list
  4. Result: Callback is never fired because it was added AFTER cancel() drained the list

The Fix:
Check is_cancelled() BEFORE releasing the lock in cancel():

pub fn cancel(&self) {
    let mut callbacks = self.callbacks.lock().unwrap_or_else(|e| e.into_inner());
    if self.is_cancelled.swap(true, Ordering::SeqCst) {
        return; // Already cancelled
    }
    let cbs: Vec<_> = callbacks.drain(..).collect();
    drop(callbacks); // Release lock before firing
    for cb in cbs {
        cb();
    }
}

pub fn on_cancel(&self, callback: CancelCallback) {
    let mut callbacks = self.callbacks.lock().unwrap_or_else(|e| e.into_inner());
    if self.is_cancelled() {
        drop(callbacks);
        callback();
    } else {
        callbacks.push(callback);
    }
}

This ensures the flag is set while holding the lock, preventing the race.


BUG: Incorrect Loop Logic in send_interval

Severity: Medium
Location: concurrency/src/threads/time.rs:92-98

The loop exits when cancellation is detected AFTER a timeout, but it should send the message BEFORE checking cancellation to avoid skipping the final message:

while let Err(RecvTimeoutError::Timeout) = wake_rx.recv_timeout(period) {
    if timer_token.is_cancelled() || actor_token.is_cancelled() {
        break;  // ← Checks BEFORE sending
    }
    let _ = handle.send(message.clone());
}

This means if cancellation happens just as the timeout expires, no message is sent. Consider checking after the send, or document this behavior clearly.


Minor Issues

  1. Potential memory leak concern (rt/src/threads/mod.rs:75-83): If a callback panics during execution, subsequent callbacks won't fire. Consider using catch_unwind or document this behavior.

  2. Missing validation (concurrency/src/threads/time.rs:18, 65): No validation for zero or very small Duration values which could cause unexpected behavior.


Positive Changes

  • Fixed the is_cancelled() bug that was clearing the flag ✓
  • Improved error logging from trace to error level ✓
  • Good test coverage for the shutdown scenario ✓
  • Correct use of &self instead of &mut self for thread-safe methods ✓

Automated review by Claude (Anthropic)

@ElFantasma
Copy link
Collaborator Author

/kimi

@ElFantasma
Copy link
Collaborator Author

/codex

@github-actions
Copy link

🤖 Kimi Code Review

Security & Bug Review

1. Critical Security IssueCancellationToken race condition (rt/src/threads/mod.rs:71-75)

The on_cancel method has a data race between checking is_cancelled() and registering the callback. Between the check and the lock acquisition, another thread could call cancel(), causing the callback to never fire. This breaks the guarantee that callbacks fire exactly once.

Fix: Move the is_cancelled() check inside the locked section to ensure atomicity.

2. Medium BugCancellationToken callback ordering issue (rt/src/threads/mod.rs:71-75)

The current implementation can drop callbacks if cancel() is called between the is_cancelled() check and the lock acquisition. While the race window is small, this violates the "exactly once" guarantee.

Fix: Same as above - check cancellation while holding the lock.

3. Low Bugsend_interval logic flaw (concurrency/src/threads/time.rs:91-97)

The cancellation check happens after the sleep expires, which could allow a message to be sent even after cancellation if the cancellation occurs between the timeout and the check. While the new wake-up mechanism reduces this window, the check should happen before sending.

Fix: Swap the order of the cancellation check and message send:

if !timer_token.is_cancelled() && !actor_token.is_cancelled() {
    let _ = handle.send(message.clone());
} else {
    break;
}

4. Low BugCancellationToken::is_cancelled() behavior change (rt/src/threads/mod.rs:65-67)

The method changed from fetch_and(false) (which clears the flag) to load(), which is a breaking change in behavior. While this appears intentional for the new callback system, it could break existing code that relied on the flag-clearing behavior.

Note: This seems intentional for the new design, but should be documented as a breaking change.

5. Minor – Unnecessary mut removal (multiple files)

The removal of mut from cancellation tokens is correct as they're now used immutably with the new callback system. No issues here.

Summary

  • Critical: Fix the race condition in CancellationToken::on_cancel
  • Medium: Ensure callback registration is atomic
  • Low: Fix potential message send after cancellation in send_interval
  • Low: Document the breaking change in is_cancelled() behavior

Automated review by Kimi (Moonshot AI)

@github-actions
Copy link

🤖 Codex Code Review

Findings

  • Bug (Medium): init_tracing() panics if called more than once because set_global_default is unwrapped. Any consumer that calls rt::tasks::run() and rt::threads::run() (or calls run twice in tests) will crash. Consider using try_init/set_global_default with graceful fallback. rt/src/tracing/mod.rs:10-18, called from rt/src/tasks/mod.rs:25-29 and rt/src/threads/mod.rs:20-23.
  • Performance/Availability (Low): ctrl_c() stores every subscriber Sender forever and never removes dead ones. Repeated calls can grow the vector unboundedly and each Ctrl+C will iterate/attempt send on stale channels. Consider pruning on send failure or using Weak-like cleanup. rt/src/threads/mod.rs:122-145.

No other security vulnerabilities or significant bugs found in the PR diff.


Automated review by OpenAI Codex

@ElFantasma
Copy link
Collaborator Author

Analysis of AI Review Comments

1. Claude & Kimi: Race condition in CancellationToken - ❌ Invalid

Both reviewers claim there's a race condition in on_cancel(). However, the code is correct:

pub fn on_cancel(&self, callback: CancelCallback) {
    let mut callbacks = self.callbacks.lock()...;  // Lock acquired FIRST
    if self.is_cancelled() {                        // Check while holding lock
        drop(callbacks);
        callback();
    } else {
        callbacks.push(callback);
    }
}

The code comment explains the design:

"Hold the lock while checking is_cancelled to avoid a race with cancel(). cancel() sets the flag BEFORE acquiring the lock, so if we see is_cancelled=false while holding the lock, cancel() hasn't drained callbacks yet and will drain ours after we release the lock."

Claude's scenario ("Thread B acquires lock, drains callbacks") is impossible while Thread A holds the lock. The callback is guaranteed to fire exactly once.


2. Claude & Kimi: send_interval logic flaw - ❌ Invalid

Current code checks cancellation BEFORE sending:

if timer_token.is_cancelled() || actor_token.is_cancelled() {
    break;  // Don't send if cancelled
}
let _ = handle.send(message.clone());

This is correct behavior. If cancelled, we should NOT send. The reviewers suggest swapping the order, which would incorrectly send messages after cancellation.


3. Codex: init_tracing() panics if called twice - ⚠️ Pre-existing

Not introduced by this PR. Outside scope.


4. Codex: ctrl_c() memory leak - ✅ Fixed in PR #140

The retain() fix addresses this. Not applicable to PR #142.


5. Kimi: is_cancelled() breaking change - ⚠️ Intentional bug fix

The old behavior (fetch_and(false)) was a bug that cleared the flag on read. The new behavior (load()) is correct. Should be documented as a breaking change.


Summary

Issue Reviewer Verdict
Race condition in on_cancel Claude, Kimi ❌ Invalid - lock prevents race
send_interval logic Claude, Kimi ❌ Invalid - current order is correct
init_tracing panic Codex ⚠️ Pre-existing, not in PR scope
ctrl_c memory leak Codex ✅ Fixed in PR #140
is_cancelled() change Kimi ⚠️ Intentional fix, should document

{
let mut iter = stream.into_iter();
let mut cancelation_token = handle.cancellation_token();
let cancelation_token = handle.cancellation_token();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a type: cancelation_token has 2 l's

Base automatically changed from feature/ctrl-c-signal-handling to main February 3, 2026 16:29
@ElFantasma ElFantasma merged commit 3a8e98c into main Feb 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix send_after/send_interval to respect Actor shutdown in threads

2 participants