Skip to content

Comments

fix: Retry transient write failures after receiving goaway#200

Open
ionutmanolache-okta wants to merge 1 commit intoparse-community:masterfrom
ionutmanolache-okta:retry_transient_write_failure
Open

fix: Retry transient write failures after receiving goaway#200
ionutmanolache-okta wants to merge 1 commit intoparse-community:masterfrom
ionutmanolache-okta:retry_transient_write_failure

Conversation

@ionutmanolache-okta
Copy link

@ionutmanolache-okta ionutmanolache-okta commented Feb 20, 2026

Retry transient write failures such:
apn write failed: New streams cannot be created after receiving a GOAWAY

Please check #201 for more details.

Summary by CodeRabbit

  • Bug Fixes

    • More consistent, robust retry behavior for write requests — handles transient HTTP errors, expired tokens, and specific APN write failures; retries now occur across new connections after GOAWAY events.
  • Tests

    • Added unit tests covering retry decision logic; updated integration tests to reflect retry-on-new-connection behavior.
  • Chores

    • Updated JavaScript linting configuration (esversion 11).

@parse-github-assistant
Copy link

I will reformat the title to use the proper commit message syntax.

@parse-github-assistant parse-github-assistant bot changed the title fix: retry transient write failures after receiving goaway fix: Retry transient write failures after receiving goaway Feb 20, 2026
@parse-github-assistant
Copy link

parse-github-assistant bot commented Feb 20, 2026

🚀 Thanks for opening this pull request!

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

Centralizes retry decision-making by adding Client.isRequestRetryable(error) and replaces inline retry checks with calls to it; renames and adjusts retry flow to Client.prototype.retryWrite and updates write/error handling across manageChannels and standard session write paths. Adds unit tests for isRequestRetryable and updates GOAWAY-related test expectations.

Changes

Cohort / File(s) Summary
Client retry refactor
lib/client.js
Added public static Client.isRequestRetryable(error) to determine retryability (status codes 408,429,500,502,503,504; 403 with 'ExpiredProviderToken'; messages starting with 'apn write failed'). Replaced inline retry checks with calls to it, renamed retryRequest -> retryWrite and updated its signature and call sites for write retry flows.
Tests — new unit & expectations
test/isRequestRetryable.unit.js, test/client.js, test/multiclient.js
Added unit tests covering isRequestRetryable behaviors; updated GOAWAY test descriptions and expected connection counts to reflect retry-on-new-connection behavior.
Test config
test/.jshintrc
Set esversion to 11 for test linting.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Session
    participant APN as APN Connection
    participant RetryMgr as retryWrite

    Client->>Session: write(notification, path, method)
    Session->>APN: send write
    APN-->>Session: error
    Session->>Client: propagate error
    Client->>Client: isRequestRetryable(error)
    alt retryable
        Client->>RetryMgr: retryWrite(subDirectory, type, method, retryCount)
        RetryMgr->>APN: establish new connection / resend
        APN-->>RetryMgr: success / error
        RetryMgr-->>Client: result
    else non-retryable
        Client-->>Session: propagate error (destroy session if needed)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: Retry transient write failures after receiving goaway' accurately and concisely summarizes the main objective of the pull request, which is to add retry behavior for write failures after HTTP/2 GOAWAY frames.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@parseplatformorg
Copy link

parseplatformorg commented Feb 20, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/client.js`:
- Around line 155-156: Add direct unit tests for isRequestRetryable covering the
new isTransientWriteFailure branch: create tests that pass an error object with
error.message starting with 'apn write failed' and assert the function returns
true; add similar positive tests for 'apn write timeout' and 'apn write aborted'
if those are intended to be retryable; also add negative tests where error
messages do not match (e.g., other apn messages or null/undefined error.error)
and assert false. Locate and exercise the isRequestRetryable function in
lib/client.js (and reference the isTransientWriteFailure check) so the new
behavior is explicitly asserted and prevents regressions. Ensure test names
describe the condition and outcome.

@ionutmanolache-okta ionutmanolache-okta force-pushed the retry_transient_write_failure branch from 1022f46 to 3d728af Compare February 21, 2026 16:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (7)
test/multiclient.js (2)

1601-1603: Same magic-number concern (ManageChannelsMultiClient)

Same derivation applies here. Consider adding the same inline comment as suggested for Line 502.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/multiclient.js` around lines 1601 - 1603, The assertion using the magic
number 8 for establishedConnections is unclear; add a concise inline comment
next to the expect(establishedConnections).to.equal(8) that explains why 8 is
expected (referencing the derivation from ManageChannelsMultiClient and how
repeated performRequestExpectingGoAway() calls lead to that count), so future
readers can understand the origin of the value without tracing test logic;
ensure the comment mentions ManageChannelsMultiClient and the relationship to
the two performRequestExpectingGoAway() calls and connection accounting.

500-502: Same magic-number concern as test/client.js Line 717

With clientCount = 2, the round-robin assigns each performRequestExpectingGoAway() call to a separate client, so each client makes connectionRetryLimit (3) + 1 = 4 connection attempts → 4 × 2 = 8. A comment would make this invariant clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/multiclient.js` around lines 500 - 502, Replace the hard-coded "8"
assertion with a self-explanatory expression or comment that encodes the
invariant: compute expectedConnections = (connectionRetryLimit + 1) *
clientCount (since each of clientCount clients will try connectionRetryLimit
retries plus one initial attempt when performRequestExpectingGoAway() is called
twice) and assert expect(establishedConnections).to.equal(expectedConnections);
or, if you prefer to keep the literal, add a one-line comment above the
assertion referencing performRequestExpectingGoAway, clientCount and
connectionRetryLimit to document why 8 = (connectionRetryLimit + 1) *
clientCount.
test/isRequestRetryable.unit.js (1)

15-23: Test inputs don't match actual production error message formats

The messages produced by lib/client.js are 'apn write timeout' (Line 625) and 'apn write aborted' (Line 636) — without the ': timed out' / ': aborted' suffixes used in these tests. The assertions still pass because neither prefix starts with 'apn write failed', but aligning the inputs with the real messages would make these tests directly document the boundary between retryable and non-retryable write errors.

✏️ Suggested update
-    const error = { error: { message: 'apn write timeout: timed out' } };
+    const error = { error: { message: 'apn write timeout' } };
-    const error = { error: { message: 'apn write aborted: aborted' } };
+    const error = { error: { message: 'apn write aborted' } };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/isRequestRetryable.unit.js` around lines 15 - 23, Update the two test
cases in test/isRequestRetryable.unit.js so their mock error.message values
match the actual production messages emitted by lib/client.js: use 'apn write
timeout' and 'apn write aborted' (without the trailing ': timed out' / ':
aborted') when constructing the error objects; this keeps the assertions against
Client.isRequestRetryable accurate and documents the exact non-retryable message
prefixes.
test/client.js (2)

2419-2421: Same magic-number concern as the Client GOAWAY test (Line 717)

Same derivation: connectionRetryLimit (3) + 1 initial attempt = 4 × 2 calls = 8. Worth adding the same explanatory comment here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/client.js` around lines 2419 - 2421, The assertion using the magic
number 8 for establishedConnections is unclear; add the same explanatory comment
as in the Client GOAWAY test to show the calculation (connectionRetryLimit (3) +
1 initial attempt = 4 per call, multiplied by 2 calls = 8). Locate the block
with performRequestExpectingGoAway() and the
expect(establishedConnections).to.equal(8) assertion and insert a concise
comment referencing connectionRetryLimit and the derivation so future readers
understand why 8 is expected.

715-717: Document the derivation of 8 expected connections

The assertion is correct, but the specific count silently depends on the default connectionRetryLimit = 3: each call makes 3 + 1 = 4 connection attempts before exhausting the retry budget, and there are 2 calls. Adding a brief comment would make the invariant self-documenting if the default ever changes.

✏️ Suggested comment
+    // connectionRetryLimit (3) + 1 initial attempt = 4 connections/call × 2 calls = 8
     expect(establishedConnections).to.equal(8);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/client.js` around lines 715 - 717, Add a brief inline comment next to
the assertion referencing performRequestExpectingGoAway and
establishedConnections that documents how the expected value 8 is derived: with
the default connectionRetryLimit = 3 each call will perform connectionRetryLimit
+ 1 = 4 attempts, and there are two calls, so 2 * 4 = 8; update the comment if
you later change the default or make the retry limit configurable.
lib/client.js (2)

155-156: isTransientWriteFailure comment is narrower than its actual scope

The comment describes only the GOAWAY race, but error.error?.message?.startsWith('apn write failed') matches any error emitted by request.on('error', …) — that handler prefixes all stream errors (RST_STREAM, socket hang-up, TLS error, etc.) with apn write failed, not just GOAWAY-driven ones. The implementation is sensibly broad — retry on any write failure is reasonable — but the comment should say so.

✏️ Suggested comment update
-    // This can happen after the server initiates a goaway, and the client did not finish closing and destroying the session yet, so the client tries to send a request on a closing session.
+    // Covers any transient stream-level write failure (e.g. GOAWAY race, RST_STREAM, socket hang-up)
+    // where retrying on a fresh connection is appropriate.
     const isTransientWriteFailure = !!error.error?.message?.startsWith('apn write failed');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/client.js` around lines 155 - 156, The existing comment above the
isTransientWriteFailure check is too narrow; update the comment to explain that
error.error?.message?.startsWith('apn write failed') detects any write-related
stream error emitted via request.on('error', …) (RST_STREAM, socket hang-up, TLS
errors, GOAWAY races, etc.), and that this broader match is intentional so the
client will consider these as transient write failures to retry; reference
isTransientWriteFailure and the error.message prefix in the comment so future
readers understand the scope and rationale.

220-243: Pre-existing duplication between the two retry catch-blocks

The manageChannels path (lines 220–243) and the standard-session path (lines 270–293) are structurally identical — the only difference is this.manageChannelsSession vs this.session. Now that retryWrite centralizes the retry logic, the outer catch-block could be extracted into a shared helper (e.g., _handleRetryableError(error, session, ...)) to avoid maintaining two copies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/client.js` around lines 220 - 243, Extract the duplicated catch-block
logic into a new helper (e.g., _handleRetryableError) that accepts the caught
error, the session to close (session or manageChannelsSession), and
subDirectoryInformation; inside the helper perform the retryWrite handling, on
500 call closeAndDestroySession(session), delete error.retryAfter, and throw the
merged object ({ ...subDirectoryInformation, ...error }) or the resentRequest
merge when retryWrite succeeds; then replace both duplicated blocks in the
manageChannels and standard-session paths to call
Client.isRequestRetryable(error) and delegate to _handleRetryableError(error,
this.manageChannelsSession, subDirectoryInformation) or
_handleRetryableError(error, this.session, subDirectoryInformation) as
appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/client.js`:
- Around line 155-156: The existing comment above the isTransientWriteFailure
check is too narrow; update the comment to explain that
error.error?.message?.startsWith('apn write failed') detects any write-related
stream error emitted via request.on('error', …) (RST_STREAM, socket hang-up, TLS
errors, GOAWAY races, etc.), and that this broader match is intentional so the
client will consider these as transient write failures to retry; reference
isTransientWriteFailure and the error.message prefix in the comment so future
readers understand the scope and rationale.
- Around line 220-243: Extract the duplicated catch-block logic into a new
helper (e.g., _handleRetryableError) that accepts the caught error, the session
to close (session or manageChannelsSession), and subDirectoryInformation; inside
the helper perform the retryWrite handling, on 500 call
closeAndDestroySession(session), delete error.retryAfter, and throw the merged
object ({ ...subDirectoryInformation, ...error }) or the resentRequest merge
when retryWrite succeeds; then replace both duplicated blocks in the
manageChannels and standard-session paths to call
Client.isRequestRetryable(error) and delegate to _handleRetryableError(error,
this.manageChannelsSession, subDirectoryInformation) or
_handleRetryableError(error, this.session, subDirectoryInformation) as
appropriate.

In `@test/client.js`:
- Around line 2419-2421: The assertion using the magic number 8 for
establishedConnections is unclear; add the same explanatory comment as in the
Client GOAWAY test to show the calculation (connectionRetryLimit (3) + 1 initial
attempt = 4 per call, multiplied by 2 calls = 8). Locate the block with
performRequestExpectingGoAway() and the
expect(establishedConnections).to.equal(8) assertion and insert a concise
comment referencing connectionRetryLimit and the derivation so future readers
understand why 8 is expected.
- Around line 715-717: Add a brief inline comment next to the assertion
referencing performRequestExpectingGoAway and establishedConnections that
documents how the expected value 8 is derived: with the default
connectionRetryLimit = 3 each call will perform connectionRetryLimit + 1 = 4
attempts, and there are two calls, so 2 * 4 = 8; update the comment if you later
change the default or make the retry limit configurable.

In `@test/isRequestRetryable.unit.js`:
- Around line 15-23: Update the two test cases in
test/isRequestRetryable.unit.js so their mock error.message values match the
actual production messages emitted by lib/client.js: use 'apn write timeout' and
'apn write aborted' (without the trailing ': timed out' / ': aborted') when
constructing the error objects; this keeps the assertions against
Client.isRequestRetryable accurate and documents the exact non-retryable message
prefixes.

In `@test/multiclient.js`:
- Around line 1601-1603: The assertion using the magic number 8 for
establishedConnections is unclear; add a concise inline comment next to the
expect(establishedConnections).to.equal(8) that explains why 8 is expected
(referencing the derivation from ManageChannelsMultiClient and how repeated
performRequestExpectingGoAway() calls lead to that count), so future readers can
understand the origin of the value without tracing test logic; ensure the
comment mentions ManageChannelsMultiClient and the relationship to the two
performRequestExpectingGoAway() calls and connection accounting.
- Around line 500-502: Replace the hard-coded "8" assertion with a
self-explanatory expression or comment that encodes the invariant: compute
expectedConnections = (connectionRetryLimit + 1) * clientCount (since each of
clientCount clients will try connectionRetryLimit retries plus one initial
attempt when performRequestExpectingGoAway() is called twice) and assert
expect(establishedConnections).to.equal(expectedConnections); or, if you prefer
to keep the literal, add a one-line comment above the assertion referencing
performRequestExpectingGoAway, clientCount and connectionRetryLimit to document
why 8 = (connectionRetryLimit + 1) * clientCount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants