Story #13198
closed
[Keep-web] Add metrics endpoint
Added by Tom Morris almost 7 years ago.
Updated over 6 years ago.
Estimated time:
(Total: 0.00 h)
Release relationship:
Auto
Description
Use same approach as keepstore metrics added in #13025 (prometheus, etc).
Easiest metrics to provide:
- reqDuration (partitioned by method and status) using promhttp.InstrumentHandlerDuration
- timeToStatus (ditto) using log.AddHook, as in #13025
This should be refactored into a go package (sdk/go/httpserver?) instead of copying code from keepstore to keep-web.
Keep-web specific metrics to provide:
- time to fetch block from keep
- cache hits, misses
- Subject changed from [Keep-web] Add track and report metrics through monitoring interface to [Keep-web]Track and report metrics through monitoring interface
- Description updated (diff)
- Subject changed from [Keep-web]Track and report metrics through monitoring interface to [Keep-web] Add metrics endpoint
- Target version changed from To Be Groomed to Arvados Future Sprints
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2018-07-18 Sprint
On the topic of metrics and health checks, we should add a page to the "Admin" section of documentation that describes which components have endpoints and how to use the health check aggregator. That would address the situation where no one remembers where we are with the project of implementing health checks / metrics, at least it would be written down.
Documenting health checks / metrics story: #13791
- Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
- Story points changed from 3.0 to 0.5
- Status changed from New to In Progress
Tried manually and all seems to work great.
One question though: Are the "time to fetch block" and "cache hits/misses" going to be implemented later / discarded? If yes, then it LGTM.
Right, this branch (just merged) only offers request timing.
Keeping issue open for the keep-web-specific metrics.
This LGTM, but it I think it would be nice to avoid computing increments twice for every counter, couldn't /status.json
be some sort of prometheus client?
- Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint
- Target version changed from 2018-08-15 Sprint to 2018-09-05 Sprint
Just one comment:
- I noticed that the documentation style changed with these new additions. All other items are listed in a tabular way instead of bullet point lists, I think the tabular way is clearer to read.
Other than that, lgtm.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF