Bug #16480
closed[keep-balance] should not timeout/fail when keepstore index takes more than 5 minutes
100%
Description
Currently sdk/go/arvados.Client
uses a 5-minute request timeout when the caller (like keep-balance) doesn't provide a custom *http.Client
. This isn't enough time to complete a keepstore index request on a cloud install with lots of data. When the timeout expires, keep-balance just gives up on the current iteration, goes to sleep, and waits for the next one, which is unlikely to fare any better.
It may be worth having an idle timer to detect hung connections, but aside from that keep-balance should wait as long as it takes for keepstore to return an index response.
Updated by Tom Clegg over 4 years ago
- Assigned To set to Tom Clegg
- Target version set to 2020-06-17 Sprint
Updated by Tom Clegg over 4 years ago
16480-keep-balance-index-timeout @ 2bc1a7a89597ab02aaeef84b82fdc51f8e375b79 -- developer-run-tests: #1919
This changes the timeout to 24h. I'm inclined to punt on the "idle timer to detect hung connections" part.
Updated by Ward Vandewege over 4 years ago
Tom Clegg wrote:
16480-keep-balance-index-timeout @ 2bc1a7a89597ab02aaeef84b82fdc51f8e375b79 -- developer-run-tests: #1919
This changes the timeout to 24h. I'm inclined to punt on the "idle timer to detect hung connections" part.
OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?
Review comment:
Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?
Otherwise, LGTM, thanks.
Updated by Tom Clegg over 4 years ago
- Target version changed from 2020-06-17 Sprint to 2020-07-01 Sprint
Updated by Tom Clegg over 4 years ago
Ward Vandewege wrote:
OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?
I suppose the only way to avoid choosing a bad timeout is to make it configurable so the admin can choose a bad timeout themselves, so I made a configurable timeout for the entire keep-balance operation, defaulting to 6h. Is that a good compromise between annoyingly-long-for-small-sites and annoyingly-short-for-big-sites?
Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?
Controller already uses a context deadline on incoming requests to enforce API.RequestTimeout, and propagates that context to outgoing proxy requests. So we were effectively limiting outgoing reqs min(5m, RequestTimeout) -- using only RequestTimeout seems better.
Generally, I'm thinking it's OK to change the exported default clients' timeouts to zero -- that's how http.DefaultClient does it, probably a hint that we should let the caller provide a timeout/context when needed.
16480-keep-balance-index-timeout @ fd080b34a321cbd6593d69f427b9eaeab890712f -- developer-run-tests: #1925
Updated by Ward Vandewege over 4 years ago
Tom Clegg wrote:
Ward Vandewege wrote:
OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?
I suppose the only way to avoid choosing a bad timeout is to make it configurable so the admin can choose a bad timeout themselves, so I made a configurable timeout for the entire keep-balance operation, defaulting to 6h. Is that a good compromise between annoyingly-long-for-small-sites and annoyingly-short-for-big-sites?
Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?
Controller already uses a context deadline on incoming requests to enforce API.RequestTimeout, and propagates that context to outgoing proxy requests. So we were effectively limiting outgoing reqs min(5m, RequestTimeout) -- using only RequestTimeout seems better.
Generally, I'm thinking it's OK to change the exported default clients' timeouts to zero -- that's how http.DefaultClient does it, probably a hint that we should let the caller provide a timeout/context when needed.
16480-keep-balance-index-timeout @ fd080b34a321cbd6593d69f427b9eaeab890712f -- developer-run-tests: #1925
Cool, this LGTM.
Updated by Anonymous over 4 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|405b13d50e203958968427a2642bc18026a0c227.