Feature #11809
closed[keep-web] Cache collections and permissions
100%
Description
Background¶
It's common for a client to make lots of requests for the same collection using the same token (e.g., static assets for a web page, or many small excerpts from a large indexed bam file).
The Go SDK automatically caches the file data, but before the file data cache can even be used, keep-web retrieves the collection from the API server in order to verify permission and determine which portions of which blocks to return. This API call causes unnecessarily high latency when data is cached, and dominates the overall response time when the response data is small.
Proposed solution¶
Use LRU caches for- collection content (uuid → pdh)
- manifests (pdh → manifest)
- permission lookups ((token, uuid-or-pdh) → bool)
If there is a Cache-Control request header, skip the cache and do the API call as before. Except: If the request specifies a PDH and the manifest is already in the cache, the only information needed is the permission check, so the API call should use the "select" parameter to avoid retrieving and returning the manifest unnecessarily.
Use TwoQueueCache from https://github.com/hashicorp/golang-lru or something similar.
The cache sizes should be configurable via/etc/arvados/keep-web/keep-web.yml
, with defaults:
UUIDCacheEntries: 100 PermissionCacheEntries: 100 ManifestCacheEntries: 100
ManifestCacheBytes: 100000000
Metrics¶
Respond to "GET /status.json" (only if there is no collection ID in the Host request header!) with current cache stats.{ "UUIDCacheHits": 1234, "UUIDCacheMisses": 2345, "PermissionCacheHits": 123, "PermissionCacheMisses": 234, "ManifestCacheHits": 1234, "ManifestCacheMisses": 2345, "ManifestCacheEntries": 100, "ManifestCacheBytes": 12345678 }