fix: Retry transient write failures after receiving goaway#200
fix: Retry transient write failures after receiving goaway#200ionutmanolache-okta wants to merge 1 commit intoparse-community:masterfrom
Conversation
|
I will reformat the title to use the proper commit message syntax. |
|
🚀 Thanks for opening this pull request! |
📝 WalkthroughWalkthroughCentralizes retry decision-making by adding Client.isRequestRetryable(error) and replaces inline retry checks with calls to it; renames and adjusts retry flow to Client.prototype.retryWrite and updates write/error handling across manageChannels and standard session write paths. Adds unit tests for isRequestRetryable and updates GOAWAY-related test expectations. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Session
participant APN as APN Connection
participant RetryMgr as retryWrite
Client->>Session: write(notification, path, method)
Session->>APN: send write
APN-->>Session: error
Session->>Client: propagate error
Client->>Client: isRequestRetryable(error)
alt retryable
Client->>RetryMgr: retryWrite(subDirectory, type, method, retryCount)
RetryMgr->>APN: establish new connection / resend
APN-->>RetryMgr: success / error
RetryMgr-->>Client: result
else non-retryable
Client-->>Session: propagate error (destroy session if needed)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/client.js`:
- Around line 155-156: Add direct unit tests for isRequestRetryable covering the
new isTransientWriteFailure branch: create tests that pass an error object with
error.message starting with 'apn write failed' and assert the function returns
true; add similar positive tests for 'apn write timeout' and 'apn write aborted'
if those are intended to be retryable; also add negative tests where error
messages do not match (e.g., other apn messages or null/undefined error.error)
and assert false. Locate and exercise the isRequestRetryable function in
lib/client.js (and reference the isTransientWriteFailure check) so the new
behavior is explicitly asserted and prevents regressions. Ensure test names
describe the condition and outcome.
1022f46 to
3d728af
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (7)
test/multiclient.js (2)
1601-1603: Same magic-number concern (ManageChannelsMultiClient)Same derivation applies here. Consider adding the same inline comment as suggested for Line 502.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/multiclient.js` around lines 1601 - 1603, The assertion using the magic number 8 for establishedConnections is unclear; add a concise inline comment next to the expect(establishedConnections).to.equal(8) that explains why 8 is expected (referencing the derivation from ManageChannelsMultiClient and how repeated performRequestExpectingGoAway() calls lead to that count), so future readers can understand the origin of the value without tracing test logic; ensure the comment mentions ManageChannelsMultiClient and the relationship to the two performRequestExpectingGoAway() calls and connection accounting.
500-502: Same magic-number concern astest/client.jsLine 717With
clientCount = 2, the round-robin assigns eachperformRequestExpectingGoAway()call to a separate client, so each client makesconnectionRetryLimit (3) + 1 = 4connection attempts →4 × 2 = 8. A comment would make this invariant clear.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/multiclient.js` around lines 500 - 502, Replace the hard-coded "8" assertion with a self-explanatory expression or comment that encodes the invariant: compute expectedConnections = (connectionRetryLimit + 1) * clientCount (since each of clientCount clients will try connectionRetryLimit retries plus one initial attempt when performRequestExpectingGoAway() is called twice) and assert expect(establishedConnections).to.equal(expectedConnections); or, if you prefer to keep the literal, add a one-line comment above the assertion referencing performRequestExpectingGoAway, clientCount and connectionRetryLimit to document why 8 = (connectionRetryLimit + 1) * clientCount.test/isRequestRetryable.unit.js (1)
15-23: Test inputs don't match actual production error message formatsThe messages produced by
lib/client.jsare'apn write timeout'(Line 625) and'apn write aborted'(Line 636) — without the': timed out'/': aborted'suffixes used in these tests. The assertions still pass because neither prefix starts with'apn write failed', but aligning the inputs with the real messages would make these tests directly document the boundary between retryable and non-retryable write errors.✏️ Suggested update
- const error = { error: { message: 'apn write timeout: timed out' } }; + const error = { error: { message: 'apn write timeout' } };- const error = { error: { message: 'apn write aborted: aborted' } }; + const error = { error: { message: 'apn write aborted' } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/isRequestRetryable.unit.js` around lines 15 - 23, Update the two test cases in test/isRequestRetryable.unit.js so their mock error.message values match the actual production messages emitted by lib/client.js: use 'apn write timeout' and 'apn write aborted' (without the trailing ': timed out' / ': aborted') when constructing the error objects; this keeps the assertions against Client.isRequestRetryable accurate and documents the exact non-retryable message prefixes.test/client.js (2)
2419-2421: Same magic-number concern as theClientGOAWAY test (Line 717)Same derivation:
connectionRetryLimit (3) + 1 initial attempt = 4 × 2 calls = 8. Worth adding the same explanatory comment here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/client.js` around lines 2419 - 2421, The assertion using the magic number 8 for establishedConnections is unclear; add the same explanatory comment as in the Client GOAWAY test to show the calculation (connectionRetryLimit (3) + 1 initial attempt = 4 per call, multiplied by 2 calls = 8). Locate the block with performRequestExpectingGoAway() and the expect(establishedConnections).to.equal(8) assertion and insert a concise comment referencing connectionRetryLimit and the derivation so future readers understand why 8 is expected.
715-717: Document the derivation of8expected connectionsThe assertion is correct, but the specific count silently depends on the default
connectionRetryLimit = 3: each call makes3 + 1 = 4connection attempts before exhausting the retry budget, and there are 2 calls. Adding a brief comment would make the invariant self-documenting if the default ever changes.✏️ Suggested comment
+ // connectionRetryLimit (3) + 1 initial attempt = 4 connections/call × 2 calls = 8 expect(establishedConnections).to.equal(8);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/client.js` around lines 715 - 717, Add a brief inline comment next to the assertion referencing performRequestExpectingGoAway and establishedConnections that documents how the expected value 8 is derived: with the default connectionRetryLimit = 3 each call will perform connectionRetryLimit + 1 = 4 attempts, and there are two calls, so 2 * 4 = 8; update the comment if you later change the default or make the retry limit configurable.lib/client.js (2)
155-156:isTransientWriteFailurecomment is narrower than its actual scopeThe comment describes only the GOAWAY race, but
error.error?.message?.startsWith('apn write failed')matches any error emitted byrequest.on('error', …)— that handler prefixes all stream errors (RST_STREAM, socket hang-up, TLS error, etc.) withapn write failed, not just GOAWAY-driven ones. The implementation is sensibly broad — retry on any write failure is reasonable — but the comment should say so.✏️ Suggested comment update
- // This can happen after the server initiates a goaway, and the client did not finish closing and destroying the session yet, so the client tries to send a request on a closing session. + // Covers any transient stream-level write failure (e.g. GOAWAY race, RST_STREAM, socket hang-up) + // where retrying on a fresh connection is appropriate. const isTransientWriteFailure = !!error.error?.message?.startsWith('apn write failed');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/client.js` around lines 155 - 156, The existing comment above the isTransientWriteFailure check is too narrow; update the comment to explain that error.error?.message?.startsWith('apn write failed') detects any write-related stream error emitted via request.on('error', …) (RST_STREAM, socket hang-up, TLS errors, GOAWAY races, etc.), and that this broader match is intentional so the client will consider these as transient write failures to retry; reference isTransientWriteFailure and the error.message prefix in the comment so future readers understand the scope and rationale.
220-243: Pre-existing duplication between the two retry catch-blocksThe manageChannels path (lines 220–243) and the standard-session path (lines 270–293) are structurally identical — the only difference is
this.manageChannelsSessionvsthis.session. Now thatretryWritecentralizes the retry logic, the outer catch-block could be extracted into a shared helper (e.g.,_handleRetryableError(error, session, ...)) to avoid maintaining two copies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/client.js` around lines 220 - 243, Extract the duplicated catch-block logic into a new helper (e.g., _handleRetryableError) that accepts the caught error, the session to close (session or manageChannelsSession), and subDirectoryInformation; inside the helper perform the retryWrite handling, on 500 call closeAndDestroySession(session), delete error.retryAfter, and throw the merged object ({ ...subDirectoryInformation, ...error }) or the resentRequest merge when retryWrite succeeds; then replace both duplicated blocks in the manageChannels and standard-session paths to call Client.isRequestRetryable(error) and delegate to _handleRetryableError(error, this.manageChannelsSession, subDirectoryInformation) or _handleRetryableError(error, this.session, subDirectoryInformation) as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/client.js`:
- Around line 155-156: The existing comment above the isTransientWriteFailure
check is too narrow; update the comment to explain that
error.error?.message?.startsWith('apn write failed') detects any write-related
stream error emitted via request.on('error', …) (RST_STREAM, socket hang-up, TLS
errors, GOAWAY races, etc.), and that this broader match is intentional so the
client will consider these as transient write failures to retry; reference
isTransientWriteFailure and the error.message prefix in the comment so future
readers understand the scope and rationale.
- Around line 220-243: Extract the duplicated catch-block logic into a new
helper (e.g., _handleRetryableError) that accepts the caught error, the session
to close (session or manageChannelsSession), and subDirectoryInformation; inside
the helper perform the retryWrite handling, on 500 call
closeAndDestroySession(session), delete error.retryAfter, and throw the merged
object ({ ...subDirectoryInformation, ...error }) or the resentRequest merge
when retryWrite succeeds; then replace both duplicated blocks in the
manageChannels and standard-session paths to call
Client.isRequestRetryable(error) and delegate to _handleRetryableError(error,
this.manageChannelsSession, subDirectoryInformation) or
_handleRetryableError(error, this.session, subDirectoryInformation) as
appropriate.
In `@test/client.js`:
- Around line 2419-2421: The assertion using the magic number 8 for
establishedConnections is unclear; add the same explanatory comment as in the
Client GOAWAY test to show the calculation (connectionRetryLimit (3) + 1 initial
attempt = 4 per call, multiplied by 2 calls = 8). Locate the block with
performRequestExpectingGoAway() and the
expect(establishedConnections).to.equal(8) assertion and insert a concise
comment referencing connectionRetryLimit and the derivation so future readers
understand why 8 is expected.
- Around line 715-717: Add a brief inline comment next to the assertion
referencing performRequestExpectingGoAway and establishedConnections that
documents how the expected value 8 is derived: with the default
connectionRetryLimit = 3 each call will perform connectionRetryLimit + 1 = 4
attempts, and there are two calls, so 2 * 4 = 8; update the comment if you later
change the default or make the retry limit configurable.
In `@test/isRequestRetryable.unit.js`:
- Around line 15-23: Update the two test cases in
test/isRequestRetryable.unit.js so their mock error.message values match the
actual production messages emitted by lib/client.js: use 'apn write timeout' and
'apn write aborted' (without the trailing ': timed out' / ': aborted') when
constructing the error objects; this keeps the assertions against
Client.isRequestRetryable accurate and documents the exact non-retryable message
prefixes.
In `@test/multiclient.js`:
- Around line 1601-1603: The assertion using the magic number 8 for
establishedConnections is unclear; add a concise inline comment next to the
expect(establishedConnections).to.equal(8) that explains why 8 is expected
(referencing the derivation from ManageChannelsMultiClient and how repeated
performRequestExpectingGoAway() calls lead to that count), so future readers can
understand the origin of the value without tracing test logic; ensure the
comment mentions ManageChannelsMultiClient and the relationship to the two
performRequestExpectingGoAway() calls and connection accounting.
- Around line 500-502: Replace the hard-coded "8" assertion with a
self-explanatory expression or comment that encodes the invariant: compute
expectedConnections = (connectionRetryLimit + 1) * clientCount (since each of
clientCount clients will try connectionRetryLimit retries plus one initial
attempt when performRequestExpectingGoAway() is called twice) and assert
expect(establishedConnections).to.equal(expectedConnections); or, if you prefer
to keep the literal, add a one-line comment above the assertion referencing
performRequestExpectingGoAway, clientCount and connectionRetryLimit to document
why 8 = (connectionRetryLimit + 1) * clientCount.
Retry transient write failures such:
apn write failed: New streams cannot be created after receiving a GOAWAYPlease check #201 for more details.
Summary by CodeRabbit
Bug Fixes
Tests
Chores