Skip to content

Add test for HTTP2 CONNECT termination#3655

Merged
seanmonstar merged 4 commits intohyperium:masterfrom
howardjohn:hyper/test-h2-connect
May 17, 2024
Merged

Add test for HTTP2 CONNECT termination#3655
seanmonstar merged 4 commits intohyperium:masterfrom
howardjohn:hyper/test-h2-connect

Conversation

@howardjohn
Copy link
Contributor

@howardjohn howardjohn commented May 1, 2024

For #3652.

In #3647, I thought I tracked down the root cause of #3652. However, with the help of @seanmonstar in #3652 (comment) I realized that this was too forcefully terminating the connection even if things other than SendRequest were still alive. I also realized the original test case I wrote was just wrong and failed to properly reproduce the issue.

This adds a test that properly reproduces the issue. On my machine, it fails about 5% of the time:

1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min

With further investigation, I believe this bug actually originates in h2 itself. hyperium/h2#772. With that PR, this test is 100% reliable

64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min

howardjohn added a commit to howardjohn/h2 that referenced this pull request May 1, 2024
See hyperium/hyper#3652.

What I have found is the final reference to a stream being dropped
after the `maybe_close_connection_if_no_streams` but before the
`inner.poll()` completes can lead to the connection dangling forever
without any forward progress. No streams/references are alive, but the
connection is not complete and never wakes up again. This seems like a
classic TOCTOU race condition.

In this fix, I check again at the end of poll and if this state is
detected, wake up the task again.

Wth the test in hyperium/hyper#3655, on my machine, it fails about 5% of the time:
```
1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min
```

With that PR, this test is 100% reliable
```
64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min
```

Note: we also have reproduced this using `h2` directly outside of `hyper`, which is what gives me
confidence this issue lies in `h2` and not `hyper`.
@howardjohn howardjohn marked this pull request as ready for review May 1, 2024 22:35
seanmonstar pushed a commit to hyperium/h2 that referenced this pull request May 2, 2024
See hyperium/hyper#3652.

What I have found is the final reference to a stream being dropped
after the `maybe_close_connection_if_no_streams` but before the
`inner.poll()` completes can lead to the connection dangling forever
without any forward progress. No streams/references are alive, but the
connection is not complete and never wakes up again. This seems like a
classic TOCTOU race condition.

In this fix, I check again at the end of poll and if this state is
detected, wake up the task again.

Wth the test in hyperium/hyper#3655, on my machine, it fails about 5% of the time:
```
1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min
```

With that PR, this test is 100% reliable
```
64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min
```

Note: we also have reproduced this using `h2` directly outside of `hyper`, which is what gives me
confidence this issue lies in `h2` and not `hyper`.
@seanmonstar
Copy link
Member

Thank you! I was waiting to release h2 with the fix, so we don't add a flaky test to hyper. The h2 v0.4.5 is out now :)

@seanmonstar seanmonstar merged commit a8f9e06 into hyperium:master May 17, 2024
@howardjohn
Copy link
Contributor Author

Ah makes sense. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants