Feature #16745
closed[keep-web] Improve performance of S3 APIs using server-side cache
100%
Description
Currently, when handling a series of S3 API calls, keep-web builds a new filesystem object for each call. In some common workflows (reading several small files from a large collection / deep in a project tree), this will be very inefficient.
Initial implementation may be a map[apitoken]sitefs
, using a single sitefs to serve many read-only requests until reaching a configured TTL or performing a write operation with the same token. This should be relatively simple, and although write operations would still be inefficient, sequences of read operations would be much faster.
Updated by Tom Clegg over 4 years ago
- Related to Story #16360: Keep-web supports S3 compatible interface added
Updated by Peter Amstutz about 4 years ago
- Related to Story #16516: Run Keepstore on local compute nodes added
Updated by Peter Amstutz about 4 years ago
- Related to deleted (Story #16360: Keep-web supports S3 compatible interface)
Updated by Peter Amstutz almost 4 years ago
Would the caching scheme apply to all of S3, WebDAV, (future) FUSE since they mostly use the same Arvados Go SDK filesystem layer?
Updated by Tom Clegg almost 4 years ago
Peter Amstutz wrote:
Would the caching scheme apply to all of S3, WebDAV, (future) FUSE since they mostly use the same Arvados Go SDK filesystem layer?
There's a good chance we can use the same caching strategy for WebDAV with little extra work, yes.
FUSE doesn't have this problem: it already mounts one filesystem and uses it for the life of the fuse mount.
Updated by Tom Clegg almost 4 years ago
- Target version changed from Arvados Future Sprints to 2021-02-17 sprint
Updated by Peter Amstutz almost 4 years ago
- Target version changed from 2021-02-17 sprint to 2021-03-03 sprint
Updated by Tom Clegg almost 4 years ago
- Status changed from In Progress to New
- Assigned To deleted (
Tom Clegg) - Target version changed from 2021-03-03 sprint to Arvados Future Sprints
- Make/find collection on ce8i5 that performs poorly when doing a series of s3 "get" requests on small files (should be easy to reproduce with a large-manifest collection inside a couple of levels of project -- s3://project_uuid/subproject_name/collection_name/file1.txt)
- Deploy new version and set collection cache size greater than 2x the test manifest size
- Confirm performance is better (there should be almost zero overhead per request)
- Load files from several large-manifest collections and confirm keep-web memory use doesn't grow past the configured cache size
- Configure a very small cache size like 1 MiB and confirm it can still serve requests (doesn't populate and then evict before using, for example)
Updated by Tom Clegg almost 4 years ago
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2021-03-03 sprint
Updated by Tom Clegg almost 4 years ago
16745-keep-web-cache @ 8c2bbecb1c09fbc3818dc1a2d73b3fda2ba68e02 -- developer-run-tests: #2338
Binary /home/tom/keep-web.8c2bbecb1 on keep.ce8i5.arvadosapi.com is available to test, in case anyone gets to it before I do.
The existing metric arvados_keepweb_collectioncache_cached_manifest_bytes now also includes the (approximate) memory used by the new session cache. This might not be the best way to express it (and ultimately the session cache should replace the collection cache entirely) but for the time being should at least make it easier to confirm the cache is behaving as expected.
Updated by Tom Clegg almost 4 years ago
quick manual testing on ce8i5
seq | 2.2.0~dev20210228221303-1 | 16745-keep-web-cache 8c2bbecb1 | |
1 | s3://ce8i5-j7d0g-93r09l3eydh39fp/ | 27s | 49s |
2 | s3://ce8i5-j7d0g-93r09l3eydh39fp/ | 34s | 1s |
3 | s3://ce8i5-j7d0g-93r09l3eydh39fp/Project 1/ | 43s | 1s |
4 | s3://ce8i5-j7d0g-93r09l3eydh39fp/Project 1/ | 42s | 1s |
5 | s3://ce8i5-j7d0g-93r09l3eydh39fp/Project 1/Project 1 - 1/ | 46s | 1s |
6 | s3://ce8i5-j7d0g-93r09l3eydh39fp/Project 1/Project 1 - 1/ | 48s | 1s |
7 | s3://ce8i5-j7d0g-93r09l3eydh39fp/ | 52s |
Updated by Tom Clegg almost 4 years ago
The surprisingly slow responses appear to be caused by s3cmd doing a "get bucket location" request ("GET /bucket/?location"), which keep-web misinterprets as "list bucket contents with no delimiter", which involves recursively listing the entire project hierarchy.
It's possible the original bug report arose from a client that uses the same library/strategy, and more of the slowness experienced by the user was attributable to "get bucket location" calls than to the get/list calls.
Updated by Tom Clegg almost 4 years ago
With GetBucketLocation returning the cluster ID as the location, same sequence of tests:
seq | |
1 | 1.4s |
2 | 0.6s |
3 | 0.7s |
4 | 0.6s |
5 | 0.7s |
6 | 0.5s |
7 | 0.6s |
Updated by Tom Clegg almost 4 years ago
16745-keep-web-cache @ 2fe25e2b32042098106acead136fd3064bab30e3 -- developer-run-tests: #2356
- new session cache
- handle GetBucketLocation API (location returned is the cluster ID)
- return 400 for unsupported APIs like GetObjectAcl instead of mishandling them as GetObject
Updated by Ward Vandewege almost 4 years ago
Tom Clegg wrote:
16745-keep-web-cache @ 2fe25e2b32042098106acead136fd3064bab30e3 -- developer-run-tests: #2356
Looks like a transient wb intergration test failure, otherwise ok.
- new session cache
- handle GetBucketLocation API (location returned is the cluster ID)
- return 400 for unsupported APIs like GetObjectAcl instead of mishandling them as GetObject
Review comment, just one thing:
- the `cached_manifest_bytes` metric is awkwardly named now that it also includes session size. Maybe we should rename it to `cached_collection_bytes` (to line up with the `MaxCollectionBytes` configuration parameter)? We only just added this metric in 2.1.0, right? It's probably fine to rename it to be more accurately named.
Thanks, LGTM otherwise.
Updated by Tom Clegg almost 4 years ago
Renamed the metric to cached_collection_bytes. Also changed the namespace from keepweb_collectioncache to keepweb_sessions. We're planning to get rid of the "collection cache" stuff and move everything to session cache, so I figure we might as well avoid changing the name again later or leaving it with a mysteriously different name.
16745-keep-web-cache @ 30bfb516f3f5bc88c3d0c07e380496f31b65945f -- developer-run-tests: #2358
Updated by Ward Vandewege almost 4 years ago
Tom Clegg wrote:
Renamed the metric to cached_collection_bytes. Also changed the namespace from keepweb_collectioncache to keepweb_sessions. We're planning to get rid of the "collection cache" stuff and move everything to session cache, so I figure we might as well avoid changing the name again later or leaving it with a mysteriously different name.
16745-keep-web-cache @ 30bfb516f3f5bc88c3d0c07e380496f31b65945f -- developer-run-tests: #2358
Thanks, LGTM.
Updated by Tom Clegg almost 4 years ago
- Target version changed from 2021-03-03 sprint to 2021-03-17 sprint
Updated by Tom Clegg almost 4 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|a76b52ff503ae14df608904349670151a5b15e47.