http: fix connections that are never closed#3647
Closed
howardjohn wants to merge 2 commits intohyperium:masterfrom
Closed
http: fix connections that are never closed#3647howardjohn wants to merge 2 commits intohyperium:masterfrom
howardjohn wants to merge 2 commits intohyperium:masterfrom
Conversation
9634e83 to
f43483f
Compare
This was referenced Apr 29, 2024
Contributor
Author
|
Closing in favor of #3655 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses an issue we found using hyper with HTTP2 CONNECT. Both the client and server are
hyper1.3.1, with tokio and rustls. In this scenario, we are only sending a single request on the connection.What we found is that on occasion, despite the
Connectionobject being fully.awaited, theSendRequestdropped, the underlying connection is still open; no GoAway is sent from the client. This hangs forever.The only way we are able to unblock the connection is if we close it from the server side, or if we use tokio's task dump feature -- this indirectly wakes up every task, allowing things to drive to completion.
Below is my analysis of how this is happening:
Then hyper makes a connection, it spawns a H2ClientFuture, and returns an ClientTask. The user awaits the ClientTask (for us, we spawn it).
The ClientTask future holds a
conn_eof: ConnEof. This is a oneshot linked up to H2ClientFuturecancel_tx.An H2ClientFuture future can finish by either
con.pollbeing ready (i.e. it terminated) orthis.drop_rxbeing ready (i.e. it was dropped). Whendrop_rxis dropped, we dropcancel_tx. This, in turn, should trigger the ClientTask to drop, and trigger a shutdown -- this is the part that never happens.drop_rxis half of an mpsc. It was added in the initial HTTP2 support (6 years ago) to detect this exact scenario as far as I can tell. The other end of the mpsc lives inconn_drop_refin theClientTask.drop_rxgetsReady((None, Receiver { closed: false }))when the finalconn_drop_refis closed.conn_drop_refcan be cloned in a few places, but for us this never happens (seems like this only happens for not-CONNECT).Instead, we see
ClientTaskcomplete, which dropsconn_drop_ref, which sends the message ondrop_rx, which triggers us todrop(this.cancel_tx).This is circular:
cancel_txis supposed to notifyconn_eof, whichClientTaskis watching.. but we got here becauseClientTaskdropped!I got a bit lost there, but I think ultimately this means our
ConnTaskhangs around forever (we returned Pending but have no way to wake it), which holds onto theh2::Connectionwhich blocks shutdown. The dump-tasks fixing it is from callingpollon ConnTask again, and this time hitting the 'conn terminated' section.The fix I have applied here is to immediately return ready if drop_rx gets a message. This seems to fix the behavior we see, and I cannot think of cases where this wouldn't be correct, though I only researched the CONNECT flow much
As part of the PR I added a simple regression test. This fails 100% of the time on my machine prior to this PR.
This issue is not caught by existing H2 CONNECT tests which all close the connection from the server side; the test here closes from the client (which, I expect, is more common in the real world?)