Project

General

Profile

Actions

Bug #7546

closed

[SDK] Go keepclient should retry PUTs on network errors

Added by Brett Smith about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Start date:
10/14/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

Keepproxy (which uses keepclient) is failing requests with the error "read tcp 10.26.0.7:25107: use of closed network connection". We have had similar errors in the past which were due to connection reuse in the Go http package. This has the side effect that if it tries to reuse a connection that's been closed, it will return an error instead of trying to reestablish a connection. The keepclient package should handle closed connection errors by retrying the request.


Subtasks 1 (0 open1 closed)

Task #7559: Review branch: 7546-put-retryResolvedRadhika Chippada10/14/2015

Actions

Related issues 1 (0 open1 closed)

Copied from Arvados - Bug #7491: [SDK] Go keepclient should retry on network errorsResolvedPeter Amstutz10/08/2015

Actions
Actions #1

Updated by Radhika Chippada about 9 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Radhika Chippada about 9 years ago

- Branch 7546-put-retry at commit 08a15a1e6b8e6f2e44f18328c1f6dd25343cffc2 is ready for review.

- I added the server to retry list if "status.statusCode 408 || status.statusCode 429 || status.statusCode >= 500" (using the list in GetOrHead). Do we need to include statusCode == 0 in this list? It appears that using a closed connection results in status.statusCode of 0 at line #205 of uploadToKeepServer method and this may need to be retried?

- One observation as I was working on putReplicas method: The requestId calculation at the beginning of this method uses "locator", which is not yet set. Shouldn't it be using "hash" instead of "locator" here?

 requestId := fmt.Sprintf("%x", md5.Sum([]byte(locator+time.Now().String())))[0:8] 
Actions #3

Updated by Peter Amstutz about 9 years ago

Radhika Chippada wrote:

- Branch 7546-put-retry at commit 08a15a1e6b8e6f2e44f18328c1f6dd25343cffc2 is ready for review.

- I added the server to retry list if "status.statusCode 408 || status.statusCode 429 || status.statusCode >= 500" (using the list in GetOrHead). Do we need to include statusCode 0 in this list? It appears that using a closed connection results in status.statusCode of 0 at line #205 of uploadToKeepServer method and this may need to be retried?

Yes, it should retry on statusCode 0 because that indicates a network error, which we definitely want to retry.

In addition, it probably should not retry on status code 503, because that's the error returned by keepstore when the keep server is full and cannot accept any more blocks.

- One observation as I was working on putReplicas method: The requestId calculation at the beginning of this method uses "locator", which is not yet set. Shouldn't it be using "hash" instead of "locator" here?

[...]

Yes, that looks like a bug.

Actions #4

Updated by Radhika Chippada about 9 years ago

Peter,
Addressed those comments.

One other observation / thought. The tests are now taking longer to run, which is understandable because now we are retrying. However, I am wondering if it is agreeable to set retry count to 0 for tests where retry is not needed / not going to make any difference one way or the other. Example of one such test seems to be: TestPutWithTooManyFail

Thanks.

Actions #5

Updated by Radhika Chippada about 9 years ago

Updated three tests to use kc.Retries = 0 (where retries are not needed). Now the test run time has come down from current master 49s to 10s.

Actions #6

Updated by Peter Amstutz about 9 years ago

Could you tweak the comment "// Timeout, too many requests, or other server side failure" to mention that error 503 means "keep server full" so we don't want to retry that case. Then please go ahead and merge.

Actions #7

Updated by Radhika Chippada about 9 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:a81ba64fcf67efcdb1323402612ca3fe5abf7b92.

Actions

Also available in: Atom PDF