Project

General

Profile

Actions

Bug #16480

closed

[keep-balance] should not timeout/fail when keepstore index takes more than 5 minutes

Added by Tom Clegg over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
06/16/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Currently sdk/go/arvados.Client uses a 5-minute request timeout when the caller (like keep-balance) doesn't provide a custom *http.Client. This isn't enough time to complete a keepstore index request on a cloud install with lots of data. When the timeout expires, keep-balance just gives up on the current iteration, goes to sleep, and waits for the next one, which is unlikely to fare any better.

It may be worth having an idle timer to detect hung connections, but aside from that keep-balance should wait as long as it takes for keepstore to return an index response.


Subtasks 1 (0 open1 closed)

Task #16486: Review 16480-keep-balance-index-timeoutResolvedWard Vandewege06/16/2020

Actions
Actions #2

Updated by Tom Clegg over 4 years ago

  • Assigned To set to Tom Clegg
  • Target version set to 2020-06-17 Sprint
Actions #4

Updated by Tom Clegg over 4 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Tom Clegg over 4 years ago

16480-keep-balance-index-timeout @ 2bc1a7a89597ab02aaeef84b82fdc51f8e375b79 -- developer-run-tests: #1919

This changes the timeout to 24h. I'm inclined to punt on the "idle timer to detect hung connections" part.

Actions #6

Updated by Ward Vandewege over 4 years ago

Tom Clegg wrote:

16480-keep-balance-index-timeout @ 2bc1a7a89597ab02aaeef84b82fdc51f8e375b79 -- developer-run-tests: #1919

This changes the timeout to 24h. I'm inclined to punt on the "idle timer to detect hung connections" part.

OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?

Review comment:

Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?

Otherwise, LGTM, thanks.

Actions #7

Updated by Tom Clegg over 4 years ago

  • Target version changed from 2020-06-17 Sprint to 2020-07-01 Sprint
Actions #8

Updated by Tom Clegg over 4 years ago

Ward Vandewege wrote:

OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?

I suppose the only way to avoid choosing a bad timeout is to make it configurable so the admin can choose a bad timeout themselves, so I made a configurable timeout for the entire keep-balance operation, defaulting to 6h. Is that a good compromise between annoyingly-long-for-small-sites and annoyingly-short-for-big-sites?

Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?

Controller already uses a context deadline on incoming requests to enforce API.RequestTimeout, and propagates that context to outgoing proxy requests. So we were effectively limiting outgoing reqs min(5m, RequestTimeout) -- using only RequestTimeout seems better.

Generally, I'm thinking it's OK to change the exported default clients' timeouts to zero -- that's how http.DefaultClient does it, probably a hint that we should let the caller provide a timeout/context when needed.

16480-keep-balance-index-timeout @ fd080b34a321cbd6593d69f427b9eaeab890712f -- developer-run-tests: #1925

Actions #9

Updated by Ward Vandewege over 4 years ago

Tom Clegg wrote:

Ward Vandewege wrote:

OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?

I suppose the only way to avoid choosing a bad timeout is to make it configurable so the admin can choose a bad timeout themselves, so I made a configurable timeout for the entire keep-balance operation, defaulting to 6h. Is that a good compromise between annoyingly-long-for-small-sites and annoyingly-short-for-big-sites?

Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?

Controller already uses a context deadline on incoming requests to enforce API.RequestTimeout, and propagates that context to outgoing proxy requests. So we were effectively limiting outgoing reqs min(5m, RequestTimeout) -- using only RequestTimeout seems better.

Generally, I'm thinking it's OK to change the exported default clients' timeouts to zero -- that's how http.DefaultClient does it, probably a hint that we should let the caller provide a timeout/context when needed.

16480-keep-balance-index-timeout @ fd080b34a321cbd6593d69f427b9eaeab890712f -- developer-run-tests: #1925

Cool, this LGTM.

Actions #10

Updated by Anonymous over 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #11

Updated by Peter Amstutz about 4 years ago

  • Release set to 25
Actions

Also available in: Atom PDF