Skip to content

Slow down polling on congested traffic#98

Merged
compulim merged 8 commits intomicrosoft:masterfrom
compulim:feat-backoff
Oct 23, 2018
Merged

Slow down polling on congested traffic#98
compulim merged 8 commits intomicrosoft:masterfrom
compulim:feat-backoff

Conversation

@compulim
Copy link
Copy Markdown
Collaborator

@compulim compulim commented Oct 14, 2018

  • Do not poll again if the existing poll did not complete yet
  • Stop on expired token
    • User need to renew the token and restart a new conversation
  • Stop on 404
    • If the bot is deleted, stop the connection and don't retry
  • Bump to 0.10.0-0

@billba
Copy link
Copy Markdown
Member

billba commented Oct 15, 2018

I'm frankly worried about making any changes to this code until there are tests verifying current behavior.

I also don't understand the new expired token behavior. When you say "restart a new conversation" do you mean create a new conversationId? The current behavior allows you to resume an existing conversation, why change that?

@billba
Copy link
Copy Markdown
Member

billba commented Oct 15, 2018

(Also, if you're going to change behavior in a breaking way, you'll need to change the version number accordingly.)

@compulim
Copy link
Copy Markdown
Collaborator Author

compulim commented Oct 15, 2018

AFAIK, the resume conversation with renewed token is not working for Web Socket today. Will take that part of code out first.

Agree we need tests. And I am working on the test harness in another branch on my fork. Hope someone can jump in sooner than me to add tests.

(Update: done on reverting token error handling)

@billba
Copy link
Copy Markdown
Member

billba commented Oct 15, 2018

Why not fix resume conversation with renewed token? Why the change of plans? Has Dan Driscoll reviewed this plan?

(I see that you removed that part of the PR, but I'd still love to understand the plan)

@compulim
Copy link
Copy Markdown
Collaborator Author

The original plan was to provide appropriate retry/backoff/fatal on 400/404/429, and this is P0. We didn't fix 429 in this one because it was not P0, and it was not well-thought (need input from service team).

Renewed token was discovered while we are fixing 400/404. So it's not actually in the plan. IMO, it's a P1 but S0. I am piggybacking it here, but seems we need more thoughts around it. So I am taking it out for now.

Dan and Vince reviewed this plan on 400/404/429. I think they didn't know about "renewed token" bug yet.

Let's visit the "renew token" in another PR. We also need to revisit the lifetime of connections (and DLJS). Because, IIRC, call to end() doesn't stop everything.

@billba
Copy link
Copy Markdown
Member

billba commented Oct 15, 2018

Sounds like a solid plan, thanks.

@compulim compulim requested a review from ckkashyap October 16, 2018 20:31
@ckkashyap
Copy link
Copy Markdown
Collaborator

I will do a round of manual testing against local DirectLine and DirectLine in prod to ensure all the changes work as expected.


In reply to: 429888497 [](ancestors = 429888497)

@compulim compulim changed the title Error handling on connection issues Slow down polling on congested traffic Oct 17, 2018
@ckkashyap
Copy link
Copy Markdown
Collaborator

I have completed manual validation of all the changes -

  1. Ensured that each polling request waits for the the previous one to finish
  2. If polling returns 404, then polling stops
  3. A 404 returned during refresh token stops further retries unlike previous code.

Encountered only one problem - a 403 returned during polling results in rapid retries. This is a regression.


In reply to: 430432369 [](ancestors = 430432369,429888497)

@compulim compulim merged commit 61b9c5e into microsoft:master Oct 23, 2018
@compulim compulim deleted the feat-backoff branch October 23, 2018 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants