Project

General

Profile

Actions

Feature #23284

closed

Keepclient reports backend keep metrics

Added by Tom Clegg 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Story points:
-
Release relationship:
Auto

Description

Add net:keep0, keepcalls, and keepcache output to mimic arv-mount:

crunchstat: keepcalls 0 put 3753 get -- interval 10.0000 seconds 0 put 0 get
crunchstat: net:keep0 0 tx 484065792 rx -- interval 10.0000 seconds 0 tx 0 rx
crunchstat: keepcache 3700 hit 8 miss -- interval 10.0000 seconds 0 hit 0 miss

Implementation:

Add prometheus metrics to sdk/go/keepclient such that a caller like lib/mount.keepFS can pass in a *prometheus.Registry to start collecting back-end and cache metrics, e.g.,

        reg := prometheus.NewRegistry()
        kc, _ := keepclient.MakeKeepClient(ac)
        kc.CollectMetrics(reg)

Connecting multiple KeepClients to the same registry also needs to work, e.g.,

        reg := prometheus.NewRegistry()
        kc, _ := keepclient.MakeKeepClient(ac)
        kc.CollectMetrics(reg)
        kc2 := kc.Clone()
        kc3, _ := keepclient.MakeKeepClient(ac)
        kc3.CollectMetrics(reg)
        // reg now reports combined metrics for kc, kc2, and kc3

In lib/mount.keepFS, use those resulting metrics to generate the new crunchstat entries.


Subtasks 1 (0 open1 closed)

Task #23299: Review 23284-keepclient-metricsResolvedTom Clegg12/16/2025Actions

Related issues 4 (2 open2 closed)

Related to Arvados - Feature #23308: Go FUSE driver crunchstat should report client-side traffic (need to add to keepclient metrics)ResolvedActions
Related to Arvados Epics - Feature #23333: Go FUSE Driver Phase 1: crunch-run uses arvados-client mountIn ProgressActions
Follows Arvados - Feature #23245: Go FUSE driver reports crunchstatsResolvedLisa KnoxActions
Precedes Arvados - Feature #23332: Go FUSE driver reports crunchstats net:keep0, keepcalls, keepcacheIn ProgressTom CleggActions
Actions #1

Updated by Tom Clegg 5 months ago

Actions #2

Updated by Tom Clegg 5 months ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg 5 months ago

  • Description updated (diff)
Actions #4

Updated by Tom Clegg 5 months ago

  • Assigned To set to Tom Clegg
  • Status changed from New to In Progress
Actions #5

Updated by Tom Clegg 5 months ago

  • Target version set to Development 2025-11-12
Actions #6

Updated by Brett Smith 4 months ago

  • Category set to FUSE
Actions #7

Updated by Brett Smith 4 months ago

  • Target version changed from Development 2025-11-12 to Development 2025-11-26
Actions #8

Updated by Brett Smith 4 months ago

  • Subtask #23299 added
Actions #9

Updated by Tom Clegg 4 months ago

23284-keepclient-metrics @ c9ba673c419be2f9b41baef9f993449cad55868a -- developer-run-tests: #4942

fix sdk/go/arvados tests, re-run remainder @ 7e07cfc04c0d76fa33dd161d34f91b5f66249492 run-tests-remainder: #5662

Done:
  • track cache hits/misses, blocks in/out, and network traffic in/out
  • enable client metrics in keepproxy
Todo:
  • enable client metrics in keep-web
  • track cache bytes in/out, application bytes in/out (arv-mount doesn't report it, but lib/mount probably should, because network traffic ÷ application traffic is an indicator of a cache thrashing pattern where we re-fetch entire blocks to serve short reads)
Actions #10

Updated by Tom Clegg 4 months ago

  • Related to Feature #23308: Go FUSE driver crunchstat should report client-side traffic (need to add to keepclient metrics) added
Actions #11

Updated by Tom Clegg 4 months ago · Edited

23284-keepclient-metrics @ b9ac9378f404a10a07816eb5a8d5c6ae2e6f15d6 -- developer-run-tests: #4944

(workbench2 test failed)

  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • ✅ Add (*keepclient.KeepClient)RegisterMetrics(*prometheus.Registry) method
    • ✨ Add keepclient backend metrics to keepproxy and keep-web
    • ❌ Piggybacking already-registered metrics is not a thing, so the kc3 example in the description is not possible. Keep-web would have been easier to instrument that way, but instead I refactored it to use Clone so all metrics can be shared.
    • We could also add something like (*keepclient.KeepClient)CombineMetrics(*keepclient.KeepClient) but it would not be trivial, because concurrency. As long as the Clone approach works I think we should leave it at that.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • I mentioned in #note-9 we should also report client-side traffic (in bytes, not just blocks) to help identify cache thrashing. Added #23308.
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • ✅ automated tests added to keepclient pkg
    • ✅ keep-web metrics test updated to check keepclient stats are reported too
  • The tested code incorporates recent main branch changes.
  • New or changed UI/UX has gotten feedback from stakeholders.
    • n/a
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • n/a (even for small reads, overhead of tracking metrics should not be noticeable)
  • Considered backwards and forwards compatibility issues between client and server.
    • n/a
  • Follows our coding standards and GUI style guidelines.

(this branch doesn't actually add the metrics to the Go FUSE driver, that part is blocked on #23245)

Actions #12

Updated by Lisa Knox 4 months ago

LGTM, with the caveat that I only mostly understand it all. It will make more sense once it is merged with #23245.

Actions #13

Updated by Tom Clegg 4 months ago

  • Assigned To deleted (Tom Clegg)

23284-keepclient-metrics merged. Leaving the issue open for the fuse/crunchstat part.

Actions #14

Updated by Brett Smith 4 months ago

  • Assigned To set to Tom Clegg
  • Status changed from In Progress to Resolved
  • Subject changed from Go FUSE driver reports backend keep metrics in crunchstat output to Keepclient reports backend keep metrics
Actions #15

Updated by Brett Smith 4 months ago

  • Precedes Feature #23332: Go FUSE driver reports crunchstats net:keep0, keepcalls, keepcache added
Actions #16

Updated by Brett Smith 4 months ago

Tom Clegg wrote in #note-13:

23284-keepclient-metrics merged. Leaving the issue open for the fuse/crunchstat part.

For easier bookkeeping, split this off into #23332.

Actions #17

Updated by Brett Smith 4 months ago

  • Related to Feature #23333: Go FUSE Driver Phase 1: crunch-run uses arvados-client mount added
Actions #18

Updated by Brett Smith about 2 months ago

  • Release set to 84
Actions

Also available in: Atom PDF