Bug #19368
closed[keep-web] [S3] slow requests caused by logUploadOrDownload
100%
Description
- if request method is anything other than PUT, POST, or GET, logUploadOrDownload() doesn't log anything -- but it does all the work to determine the collection ID and properties that would be logged if the method were different.
- if Collections.WebDAVLogEvents is disabled, it spends time determining the collection ID and properties to log them to stderr (although not to the logs table).
- determineCollection() walks the filesystem tree from root to requested target, looking for a special
.arvados#collection
file at each level, which may incur several unnecessary API calls (e.g.,/by_id/$projectid/.arvados#collection
will try to look up a subproject or collection with that name) - (most importantly?) generating the magic
.arvados#collection
file involves writing a new manifest for the entire directory tree, which (a) is not actually needed here, (b) can be very large and therefore slow to generate, and (c) due to the root-to-leaf approach is always generated for the entire collection, even though reading the magic file from the same directory as the requested target would often generate a much smaller manifest and still return the correct collection UUID (although it wouldn't reveal the path relative to collection root, which the logging feature uses).
- Introduce an
.arvados#uuid
special file that just returns the UUID of the relevant collection or project represented by that directory
- Attach the user/collection IDs to the "response" log rather than creating a third ("file upload" / "file download") log entry per request with those fields added. We now have a feature in the httpserver package to make this easy.
Files
Updated by Tom Clegg over 2 years ago
- Related to Story #17464: Logging and restricting downloads in keep-web and keepproxy added
Updated by Tom Clegg over 2 years ago
- Related to Bug #19192: WebDAVCache not performing as expected for S3 requests added
Updated by Tom Clegg over 2 years ago
Here's a dev build with both the bugfix from #19192#note-5, the watchdog / stack dump from #19192#note-4, and a fix for the "generate manifest for entire collection on each request" issue mentioned here.
(This bypasses commits on the main branch since #19192, in the interest of minimizing any version-skew issues while testing the fix.)
Updated by Tom Clegg over 2 years ago
19368-webdav-logging-speedup @ 31a9473bdce412db33a4afa53329701e2cd88e4d -- developer-run-tests: #3265
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-17 sprint to 2022-08-31 sprint
Updated by Tom Clegg over 2 years ago
Here's a dev build with #19368#note-5 and content-sniffing disabled on S3 GET/HEAD requests, i.e., if the requested file's extension is not in /etc/mime.types, the file data will not be read, and the returned content-type will be application/octet-stream.
Updated by Tom Clegg over 2 years ago
Another dev build, #19368#note-10 but with the Sys() solution (since merged to main) instead of the .arvados#collection_id
. (When the bucket ID was a project ID, the .arvados#collection_id
approach still caused an extra groups#contents API call for the bucket project itself and each intervening project, which can easily add multiple seconds to each HEAD request.)
Updated by Tom Clegg over 2 years ago
Another dev build, #19368#note-11 plus a fix to prevent cache size accounting from blocking concurrent filesystem operations.
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-09-14 sprint
Updated by Tom Clegg over 2 years ago
- Status changed from In Progress to Resolved
- Release set to 53