Feature #13025
closed[keepstore] runtime metrics endpoint
Added by Tom Clegg almost 7 years ago. Updated over 6 years ago.
100%
Description
Start by exporting request time statistics using https://godoc.org/github.com/prometheus/client_golang/prometheus or similar.
Updated by Tom Morris almost 7 years ago
- Related to Story #10484: [keepstore] Add S3 bucket statistics and current #clients to status.json added
Updated by Tom Morris almost 7 years ago
Here are some of the background notes from previous discussions:
- https://dev.arvados.org/issues/10484#note-4 from November, 2016:
Not to dive into implementation details, but introducing a general purpose metrics package may provide benefits going forward.
I've used the Dropwizard metrics in the past and like them. A Go port of that package can be found here: https://github.com/armon/go-metrics
There's also: https://github.com/rcrowley/go-metrics which is described in this blog post: http://blog.dekstroza.io/monitoring-software-written-in-golang/
- June 2017 email before status.json meeting:
Some background reading:
- metrics vs logs - https://grafana.com/blog/2016/01/05/logs-and-metrics-and-graphs-oh-my/
- Dropwizard metrics - http://metrics.dropwizard.io/3.2.2/
- Go port of above - https://github.com/rcrowley/go-metrics
- Coda Hale's (author of Dropwizard) Go metrics package - https://github.com/codahale/metrics
- Prometheus - https://prometheus.io/docs/introduction/comparison/I think we have two different things that we're interested here:
1. Health checks
2. MetricsWe should be sure to distinguish between them and understand which we're talking about at any given time.
It seems like some of the metrics from #10484 are candidates to get consolidated here, but probably best to focus on additional metrics first.
Updated by Tom Clegg almost 7 years ago
[{
"name": "request_duration_seconds",
"help": "Summary of request duration.",
"type": "SUMMARY",
"metric": [
{
"label": [
{
"name": "code",
"value": "200"
},
{
"name": "method",
"value": "get"
}
],
"summary": {
"sample_count": "1",
"sample_sum": 0.000342534,
"quantile": [
{
"quantile": 0.5,
"value": 0.000342534
},
{
"quantile": 0.9,
"value": 0.000342534
},
{
"quantile": 0.99,
"value": 0.000342534
}
]
}
},
{
"label": [
{
"name": "code",
"value": "200"
},
{
"name": "method",
"value": "put"
}
],
"summary": {
"sample_count": "9",
"sample_sum": 0.258729368,
"quantile": [
{
"quantile": 0.5,
"value": 0.023052607
},
{
"quantile": 0.9,
"value": 0.072060673
},
{
"quantile": 0.99,
"value": 0.072060673
}
]
}
}
]
}]
Updated by Tom Clegg almost 7 years ago
13025-keepstore-metrics @ 85398442d095524c5b2f315c294ea81f9d17853b
This adds "request_duration_seconds" and "time_to_status_seconds" summary metrics.
I haven't added s3 and Azure backend stats yet, but I don't think that's a reason to delay merging these basic stats.
Updated by Lucas Di Pentima almost 7 years ago
- Some tests are failing: https://ci.curoverse.com/job/developer-run-tests/573/
- File services/keepstore/config.go
- Line 99: I think the ‘else’ clause is not needed.
- Line 197: Can ‘req’ param be replaced by
_
onexportJSON()
’s signature if it’s not going to be used?
- Does this feature need some sort of documentation?
Updated by Tom Clegg almost 7 years ago
Lucas Di Pentima wrote:
- Some tests are failing: https://ci.curoverse.com/job/developer-run-tests/573/
Aha... keep-web was counting on WroteStatus() to return 0 when there has been no explicit call to WriteHeader(). Reverted that change and fixed the status=0 logging bug at log-emitting time instead.
- File services/keepstore/config.go
- Line 99: I think the ‘else’ clause is not needed.
But without the else, log.Formatter wouldn't get set...? (I rearranged this to put the "else" case in the function body, as recommended by golint.)
- Line 197: Can ‘req’ param be replaced by
_
onexportJSON()
’s signature if it’s not going to be used?
Can be, but even go vet and golint seem ok with it... :)
- Does this feature need some sort of documentation?
Sure, although we might try it for a bit before we advertise it.
13025-keepstore-metrics @ 3826a6339ba1c901c054053920ed20547b3ba54d
Updated by Tom Clegg almost 7 years ago
- Status changed from In Progress to Resolved