Fix #421: Prevent timestamp jumps from tick counter overflow#447
Fix #421: Prevent timestamp jumps from tick counter overflow#447rishabhvaish wants to merge 5 commits intomatth-x:mainfrom
Conversation
…rflow The multiplication (ticks_now - mocpp_ticks_count) * 1000UL can overflow a 32-bit unsigned long when the tick delta exceeds ~4.3M ticks. On ESP-IDF with the default 1000 Hz tick rate, this happens if the function is not called for approximately 72 minutes, producing a truncated result that causes ~12-hour (49,250s) time jumps in Clock::now(). Fix: cast to unsigned long long before multiplying, then truncate back to unsigned long after dividing by configTICK_RATE_HZ. Refs matth-x#421
Add a safety cap of 1 hour on the time delta computed in Clock::now(). This guards against remaining edge cases where arithmetic overflow or concurrent access produces a delta that would cause the OCPP timestamp to jump forward by hours or days. If the real elapsed time exceeds the cap, the clock catches up incrementally on subsequent calls rather than applying a single large jump. Refs matth-x#421
|
Thanks for helping us out here! ❤️ |
|
Please note that our jump is always 3 months, not much, not less... so not sure we're in the same boat. Also, to make sure we don't break something else for critical code, let's use a TDD approach here, try to reproduce it with a test first, I tried but didn't succeed with vibe-coding 🙁 |
Per reviewer feedback, instead of capping the delta to 1 hour (which still causes a potentially large time jump), skip the update entirely when the delta exceeds a plausible threshold. The clock will resync naturally on the next setTime() call from the CSMS. This ensures Clock::now() never produces time jumps, maintaining consistency for transaction timestamps and other time-sensitive operations.
Add TickOverflow.cpp with two test cases covering the fixes in matth-x#421: 1. Arithmetic overflow test: proves the 32-bit multiplication in mocpp_tick_ms_espidf() overflows for tick deltas > ~72 min, while the 64-bit fix produces correct results. Tests small deltas, 90-min deltas, max uint32 delta, and the exact overflow boundary. 2. Clock::now() guard test: verifies that implausible time jumps (> 1 hour) from overflow or concurrency are rejected, the clock recovers on subsequent normal advances, boundary values are handled correctly, and setTime() resync works after rejection. Also add ArduinoJson include path to the test target (was missing relative to the library target). Refs matth-x#421 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks @devunwired and @razvanphp! Good call on the TDD approach — I'll write a test that reproduces the 3-month jump scenario first, then verify the fix addresses it. Will update the PR with the test. |
…#421) Addresses @razvanphp's reviewer feedback on PR matth-x#447: reproduce the exact 3-month jump scenario before relying on the fix. New TEST_CASE with 6 sections: - Direct 3-month jump injection via mock timer (rejected by guard) - Clock recovery after rejected 3-month jump - CSMS resync via setTime after rejected jump - 32-bit arithmetic demonstration showing how unsigned wraparound produces month-scale erroneous deltas - End-to-end scenario: normal operation -> overflow -> verify month stays correct (Jan, not April) - Multiple overflow-scale jumps interleaved with normal advances All sections verify that Clock::now() rejects the implausible delta and that the OCPP timestamp never jumps to a future month. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed @devunwired's feedback in aac9147 — the fix now skips the update entirely when delta exceeds the threshold (instead of capping). Added 6 test sections reproducing the 3-month jump scenario per @razvanphp's TDD request — all 38 assertions pass. The tests cover: direct 3-month jump rejection, recovery after rejection, CSMS resync, 32-bit wraparound arithmetic, and repeated overflow events. |
|
@matth-x what can we do to move this forward? |
Fixes #421
Problem
Chargers running MicroOcpp on ESP-IDF (and potentially Arduino) experience periodic ~12-hour (49,250s) timestamp jumps. This has been observed by multiple users (@devunwired, @razvanphp) and causes incorrect OCPP timestamps in transaction events, meter values, and heartbeats.
Root Cause
The multiplication overflow in
mocpp_tick_ms_espidf():On Xtensa (ESP32),
unsigned longis 32-bit. When the tick delta multiplied by 1000 exceedsULONG_MAX(4,294,967,295), the intermediate result is silently truncated. This produces a wrong millisecond delta that propagates throughClock::now()as a ~49,250-second time jump.For example, with the maximum possible tick difference of
0xFFFFFFFF:0xFFFFFFFF * 1000= 4,294,967,295,000 → truncated to 32-bit = 3,294,967,296configTICK_RATE_HZyields ~49,250 fewer seconds than expectedFix
1.
Platform.cpp— Use 64-bit intermediate arithmeticThis ensures the multiplication
tick_delta * 1000is performed in 64-bit precision before dividing and truncating back tounsigned long.2.
Time.cpp— Safety cap onClock::now()deltaAdded a 1-hour cap on the time delta in
Clock::now()as a defensive measure against any remaining edge cases (including the thread-safety concerns noted in the issue discussion). If the computed delta exceeds 1 hour, it is capped — the clock catches up incrementally on subsequent calls rather than applying a single large jump that would corrupt OCPP timestamps.Testing