Actions
Bug #19368
closed[keep-web] [S3] slow requests caused by logUploadOrDownload
Start date:
08/12/2022
Due date:
% Done:
100%
Estimated time:
(Total: 0.00 h)
Story points:
-
Release:
Release relationship:
Auto
Description
While investigating slow HEAD requests I noticed a few inefficiencies in logUploadOrDownload that could be causing substantial delays:
- if request method is anything other than PUT, POST, or GET, logUploadOrDownload() doesn't log anything -- but it does all the work to determine the collection ID and properties that would be logged if the method were different.
- if Collections.WebDAVLogEvents is disabled, it spends time determining the collection ID and properties to log them to stderr (although not to the logs table).
- determineCollection() walks the filesystem tree from root to requested target, looking for a special
.arvados#collection
file at each level, which may incur several unnecessary API calls (e.g.,/by_id/$projectid/.arvados#collection
will try to look up a subproject or collection with that name) - (most importantly?) generating the magic
.arvados#collection
file involves writing a new manifest for the entire directory tree, which (a) is not actually needed here, (b) can be very large and therefore slow to generate, and (c) due to the root-to-leaf approach is always generated for the entire collection, even though reading the magic file from the same directory as the requested target would often generate a much smaller manifest and still return the correct collection UUID (although it wouldn't reveal the path relative to collection root, which the logging feature uses).
- Introduce an
.arvados#uuid
special file that just returns the UUID of the relevant collection or project represented by that directory
- Attach the user/collection IDs to the "response" log rather than creating a third ("file upload" / "file download") log entry per request with those fields added. We now have a feature in the httpserver package to make this easy.
Files
Actions