Actions
Feature #15340
closed[arvados-dispatch-cloud] Error-counting metrics
Start date:
Due date:
% Done:
100%
Estimated time:
Story points:
1.0
Release:
Release relationship:
Auto
Description
Add to prometheus metrics:
counter vectorarvados_dispatchcloud_driver_operations
- number of cloud operations, split by operation type (op=Create/Destroy/List/SetTags) and result (error=0/1)
- can be implemented as a driver proxy similar to rateLimitedInstanceSet in source:lib/dispatchcloud/driver.go
- most likely usage in graphs/alerts is
arvados_dispatchcloud_driver_operations{error=1}
arvados_dispatchcloud_instances_disappeared
- number of times an instance disappeared in cloud (see sync() in source:lib/dispatchcloud/worker/pool.go), split by state
- most likely usage in graphs/alerts is
arvados_dispatchcloud_instances_disappeared{state!="shutdown"}
Actions