Bug #19368: [keep-web] [S3] slow requests caused by logUploadOrDownload - Arvados

Actions

Copy link

Bug #19368

closed

[keep-web] [S3] slow requests caused by logUploadOrDownload

Added by Tom Clegg over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Tom Clegg

Category:

Target version:

2022-09-14 sprint

Story points:

Release:

Arvados 2.4.3

Release relationship:

Auto

Description

While investigating slow HEAD requests I noticed a few inefficiencies in logUploadOrDownload that could be causing substantial delays:

if request method is anything other than PUT, POST, or GET, logUploadOrDownload() doesn't log anything -- but it does all the work to determine the collection ID and properties that would be logged if the method were different.
if Collections.WebDAVLogEvents is disabled, it spends time determining the collection ID and properties to log them to stderr (although not to the logs table).
determineCollection() walks the filesystem tree from root to requested target, looking for a special .arvados#collection file at each level, which may incur several unnecessary API calls (e.g., /by_id/$projectid/.arvados#collection will try to look up a subproject or collection with that name)
(most importantly?) generating the magic .arvados#collection file involves writing a new manifest for the entire directory tree, which (a) is not actually needed here, (b) can be very large and therefore slow to generate, and (c) due to the root-to-leaf approach is always generated for the entire collection, even though reading the magic file from the same directory as the requested target would often generate a much smaller manifest and still return the correct collection UUID (although it wouldn't reveal the path relative to collection root, which the logging feature uses).

Proposed fix:

Introduce an .arvados#uuid special file that just returns the UUID of the relevant collection or project represented by that directory

Related possible improvement:

Attach the user/collection IDs to the "response" log rather than creating a third ("file upload" / "file download") log entry per request with those fields added. We now have a feature in the httpserver package to make this easy.

Files

Download all files

keep-web (11 MB) keep-web	265afdad112b129c36235935470d4a410161a9ef-dev	Tom Clegg, 08/09/2022 04:10 PM
keep-web (11 MB) keep-web	0d8b4f1ad827a575bf74b058426eb898257592e5-dev	Tom Clegg, 08/22/2022 02:48 PM
keep-web (11 MB) keep-web	b896ceb55db0593631718cb13ee95b2414afe8f9-dev	Tom Clegg, 08/24/2022 02:03 PM
keep-web (11 MB) keep-web	87a93969ba0b4eaa1d8c63af5c039e7fed908a31-dev	Tom Clegg, 08/29/2022 10:01 PM
keep-web (11 MB) keep-web	13ebfc417bbdb8c5d325112a72248041b4ae49fd-dev	Tom Clegg, 08/30/2022 02:44 PM

Subtasks 1 (0 open — 1 closed)

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Tom Clegg over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Tom Clegg over 3 years ago

Related to Idea #17464: Logging and restricting downloads in keep-web and keepproxy added

Actions

Copy link

Updated by Tom Clegg over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Tom Clegg over 3 years ago

Related to Bug #19192: WebDAVCache not performing as expected for S3 requests added

Actions

Copy link

Updated by Tom Clegg over 3 years ago

File keep-web keep-web added

Here's a dev build with both the bugfix from #19192#note-5, the watchdog / stack dump from #19192#note-4, and a fix for the "generate manifest for entire collection on each request" issue mentioned here.

(This bypasses commits on the main branch since #19192, in the interest of minimizing any version-skew issues while testing the fix.)

Actions

Copy link

Updated by Tom Clegg over 3 years ago

19368-webdav-logging-speedup @ 31a9473bdce412db33a4afa53329701e2cd88e4d -- developer-run-tests: #3265

Actions

Copy link

Updated by Peter Amstutz over 3 years ago

Target version changed from 2022-08-17 sprint to 2022-08-31 sprint

Actions

Copy link

Updated by Lucas Di Pentima over 3 years ago

This LGTM, thanks!

Actions

Copy link

#10

Updated by Tom Clegg over 3 years ago

File keep-web keep-web added

Here's a dev build with #19368#note-5 and content-sniffing disabled on S3 GET/HEAD requests, i.e., if the requested file's extension is not in /etc/mime.types, the file data will not be read, and the returned content-type will be application/octet-stream.

Actions

Copy link

#11

Updated by Tom Clegg over 3 years ago

File keep-web keep-web added

Another dev build, #19368#note-10 but with the Sys() solution (since merged to main) instead of the .arvados#collection_id. (When the bucket ID was a project ID, the .arvados#collection_id approach still caused an extra groups#contents API call for the bucket project itself and each intervening project, which can easily add multiple seconds to each HEAD request.)

Actions

Copy link

#12