Bug #15521
closed
[keepstore] error reporting improvements
Added by Peter Amstutz over 5 years ago.
Updated almost 5 years ago.
Estimated time:
(Total: 0.00 h)
Release relationship:
Auto
Description
From ops bug #15520
Keepstore logging improvements:
- Keepstore PutBlock() calls log.Printf, this line of code is untouched from 2014 (!) it is being logged in JSON format but lacks useful context like the request id.
- The error that is sent to the client is not logged at all.
- The log doesn't say anything about where the block is being fetched from -- which volume, bucket, remote cluster, anything
- the error that reaches the user needs to make it clear that the problem was in fetching a remote block; requires some combination of improving server and client error messages
- Description updated (diff)
- Target version changed from 2019-08-14 Sprint to Arvados Future Sprints
- Subject changed from federation error reporting improvements to [keepstore] error reporting improvements
- Related to Bug #15606: [keep-web] logging doesn't include error messages added
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2019-11-06 Sprint
- Status changed from New to In Progress
- Related to Bug #15713: [Controller] Internal error not logged added
- Keepstore PutBlock() calls log.Printf, this line of code is untouched from 2014 (!) it is being logged in JSON format but lacks useful context like the request id.
Updated "MD5 checksum %s did not match request" logs (and many other logs in keepstore) to use ctxlog so they include request id, loglevel, etc.
Fixed an unreported error: PutBlock tries volmgr.NextWritable(), and if that fails, it tries all writable volumes in sequence. The error from that first failure wasn't being logged.
- The error that is sent to the client is not logged at all.
Already fixed in #15713.
- The log doesn't say anything about where the block is being fetched from -- which volume, bucket, remote cluster, anything
If we get bad data from a remote service, we get two log entries:
msg="%s: MD5 checksum %s did not match request"
respStatusCode=502 respBody="checksum mismatch in remote response" (respBody added in #15713)
- the error that reaches the user needs to make it clear that the problem was in fetching a remote block; requires some combination of improving server and client error messages
The server is sending "checksum mismatch in remote response".
15521-keepstore-logging @ 62d28600cbfc31f8e72c61e4519ff198cb66a02a -- https://ci.curoverse.com/view/Developer/job/developer-run-tests/1616/
- Status changed from In Progress to Resolved
Also available in: Atom
PDF