[fix][broker]Resolve the issue where markDeletePosition points to a non-existent location when creating a new subscription#25283
Conversation
…ubscription will point to a non-existent position.
| if (ledgerId > lastConfirmedEntry.getLedgerId()) { | ||
| checkState(ledgers.get(ledgerId).getEntries() == 0); | ||
| ledgerId = lastConfirmedEntry.getLedgerId(); | ||
| return PositionFactory.create(lastConfirmedEntry.getLedgerId(), lastConfirmedEntry.getEntryId()); |
There was a problem hiding this comment.
I don't think it's a correct fix since it changed the semantics from getFirstPosition method. Now if there is now entries from the first ledger, returned the first ledger id and -1 as the entry id is expected.
Now this PR changed to the cached lastConfirmedEntry which already been deleted. From the caller perspective, it can't know if this entry is a valid entry.
But the line:4230 seems incorrect. It should use 0 as the entry ID if the first ledger has entries?
There was a problem hiding this comment.
But if not modified here, the ledger ID of 'lastConfirmed Entry' and 'entry' -1 will still be used.
There was a problem hiding this comment.
+1. I think this changes the shared semantics of getFirstPosition(), and the fix should stay in the cursor initialization / recovery path instead.
Motivation
when a ManagedLedger performs a ledger rollover (switching to a new ledger), the newly created ledger starts in an empty state. During this transition, lastConfirmedEntry is temporarily set to (newLedgerId, -1), which represents a position in a ledger that contains no entries — i.e., a non-existent position in BookKeeper.
Although recoveredCursor has a guard to handle the case where entryId == -1 and the ledger does not exist in the ledger map, the newly created (but empty) ledger is present in the ledgers map (since it was just registered), so ledgerExists() returns true. As a result, the markDeletePosition of the newly created subscription is incorrectly set to (newLedgerId, -1) — a position that does not correspond to any real message in BookKeeper.
This incorrect markDeletePosition can cause issues such as:
Inaccurate backlog calculation for the subscription.
Potential errors when components try to read or validate the mark-delete position against BookKeeper.
Inconsistent behavior when the empty ledger is later trimmed or deleted.
Modifications
Modified the asyncOpenCursor method in ManagedLedgerImpl to ensure that when InitialPosition.Latest is used, the position passed to cursor.initialize() is validated and does not point to a non-existent entry (i.e., entryId == -1 in an empty ledger). If the resolved lastConfirmedEntry has entryId == -1, the position is adjusted to point to the last valid entry of the previous non-empty ledger instead.
Updated recoveredCursor in ManagedCursorImpl (or the position resolution logic) to correctly handle the case where the markDeletePosition falls on an empty ledger that exists in the ledger map but contains no entries, ensuring the position is adjusted to a valid, existing entry.
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: