Bug #14804
closed[keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failure
100%
Description
Currently, when keepstore is trying to read a block, if one Azure-backed volume encounters a 503 error and all other volumes return 404, keepstore returns 404 to its client. This is a non-retryable error so the client will give up.
The correct behavior is to return a 502 or 503 status in this situation.
Azure error message:
storage: service returned error: StatusCode=503, ErrorCode=ServerBusy, ErrorMessage=The server is busy.
Updated by Tom Morris almost 6 years ago
- Target version changed from Arvados Future Sprints to 2019-02-27 Sprint
Updated by Lucas Di Pentima almost 6 years ago
- Assigned To set to Lucas Di Pentima
Updated by Lucas Di Pentima almost 6 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima almost 6 years ago
Updates at 601764a10 - branch 14804-keepstore-transient-backend-errors
Test run: https://ci.curoverse.com/job/developer-run-tests/1082/
When requesting a block, if keepstore
gets errors from all of its volumes, the error that was being returned to the client was 404 no matter which error the volumes returned.
Now, when receiving a VolumeBusyError
(transient error) from a volume backend, keepstore
will return a 503 status so that the client can retry instead of mistakenly believe that the block is not there.
Updated by Lucas Di Pentima almost 6 years ago
Re-running developer-run-tests-remainder at: https://ci.curoverse.com/job/developer-run-tests-remainder/1117/
Updated by Eric Biagiotti almost 6 years ago
Small nit pick, I would update the comment for TestGetHandler
to include your test scenario. Otherwise, LGTM.
Updated by Lucas Di Pentima almost 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|7e5f0e9ca6756099f761cc3f392476f362cd1645.
Updated by Tom Clegg almost 6 years ago
- Related to Bug #15118: [keepstore] Return 5xx (not 4xx) if block is not found due to transient backend device failure added