Project

General

Profile

Actions

Bug #18719

closed

s3 sync problems with arv-mount

Added by Peter Amstutz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Start date:
02/07/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Running "s3 sync" on arv-mount doesn't behave well.

aws s3 sync --endpoint=https://collections.ce8i5.arvadosapi.com s3://ce8i5-4zz18-jaj3bafbrt0cb7s/ .
[Errno 2] No such file or directory
Traceback (most recent call last):
  File "awscli/clidriver.py", line 459, in main
  File "awscli/customizations/commands.py", line 197, in __call__
  File "awscli/customizations/commands.py", line 191, in __call__
  File "awscli/customizations/s3/subcommands.py", line 725, in _run_main
  File "awscli/customizations/s3/subcommands.py", line 991, in run
  File "awscli/customizations/s3/fileformat.py", line 55, in format
  File "awscli/customizations/s3/fileformat.py", line 84, in local_format
  File "posixpath.py", line 379, in abspath
FileNotFoundError: [Errno 2] No such file or directory
>> os.path.abspath(".")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/posixpath.py", line 371, in abspath
    cwd = os.getcwd()
OSError: [Errno 2] No such file or directory
sh-4.2$ cat .arvados#collection
cat: .arvados#collection: Input/output error
2022-02-04 14:17:39 arvados.arvados_fuse[1631186] ERROR: Unhandled exception during FUSE operation
Traceback (most recent call last):
  File "/home/peter/work/arvados/services/fuse/arvados_fuse/__init__.py", line 327, in catch_exceptions_wrapper
    return orig_func(self, *args, **kwargs)
  File "/home/peter/work/arvados/services/fuse/arvados_fuse/__init__.py", line 583, in lookup
    inode = p[name].inode
  File "/home/peter/work/arvados/services/fuse/arvados_fuse/fresh.py", line 25, in use_counter_wrapper
    return orig_func(self, *args, **kwargs)
  File "/home/peter/work/arvados/services/fuse/arvados_fuse/fresh.py", line 34, in check_update_wrapper
    return orig_func(self, *args, **kwargs)
  File "/home/peter/work/arvados/services/fuse/arvados_fuse/fusedir.py", line 566, in __getitem__
    self.collection_record_file = ObjectFile(self.inode, self.collection_record)
  File "/home/peter/work/arvados/services/fuse/arvados_fuse/fusefile.py", line 102, in __init__
    self.object_uuid = obj['uuid']
TypeError: 'NoneType' object is not subscriptable

Subtasks 1 (0 open1 closed)

Task #18722: Review 18719-fuse-fixesResolvedWard Vandewege02/07/2022

Actions
Actions #1

Updated by Peter Amstutz almost 3 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz almost 3 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz almost 3 years ago

Issues seem to be:

  • Generating conflicts with self (longstanding bug #11553)
  • Replacing collection entry with a new collection entry, but they have the same uuid, gets a new inode, old inode goes away, so then getpwd() errors out
  • collection_record not updated with new one (this might be a straightforward omission)
Actions #4

Updated by Peter Amstutz almost 3 years ago

"contents" response is returning past collections versions?

Actions #5

Updated by Peter Amstutz almost 3 years ago

aws s3 sync --endpoint=https://collections.ce8i5.arvadosapi.com s3://ce8i5-4zz18-pe56veug61jpmds/ .

Actions #6

Updated by Peter Amstutz almost 3 years ago

18719-fuse-fixes @ 97ace0ab8a33f488715909ba1058c790aeb0900b

Addresses the following bugs:

  • group.contents appears to include previous version snapshots. Because the previous version typically has the same name as the head version, the snapshot ends up replacing the head in the directory listing.
  • websockets also sends out a creation event for the snapshot. It appears that websockets currently doesn't provide enough of the collection record to be able to distinguish between a new collection and a snapshot of a past version (and FUSE wasn't aware of snapshots anyway). This produces a similar problem where the snapshot ends up replacing the head version because they have the same name.
  • When triggering live updates based on websocket events, it would try to update a collection to a specific portable data hash, but this also ran into the issue of confusing the snapshot with the head version.
  • when running update(), or on flush() after the user made a change to the collection, it did not update the internal copy of the collection record with the copy it got back from the update operation, as a result the special ".arvados#collection" was always stale
  • When multiple files are being flushed at once, the commit_all() part of flush() can overlap, which can result in double attempts to delete a buffer block

Fixes:

  • Filter snapshots from group.contents response
  • Only invalidate directories based on websocket events (to be updated on next access), stop trying to be clever about applying incremental changes or updating to specific PDHs
  • Change collection_record_file to use a callback, the callback flushes and then gets the latest collection record from Collection object
  • Be more strategic about when to invalidate the collection_record_file
  • Looking up ".arvados#collection" invalidates it and forces a refresh
  • If a buffer block doesn't exist and we try to delete it, don't throw an error

developer-run-tests: #2899

Actions #7

Updated by Ward Vandewege almost 3 years ago

Peter Amstutz wrote:

18719-fuse-fixes @ 97ace0ab8a33f488715909ba1058c790aeb0900b

Addresses the following bugs:

  • group.contents appears to include previous version snapshots. Because the previous version typically has the same name as the head version, the snapshot ends up replacing the head in the directory listing.
  • websockets also sends out a creation event for the snapshot. It appears that websockets currently doesn't provide enough of the collection record to be able to distinguish between a new collection and a snapshot of a past version (and FUSE wasn't aware of snapshots anyway). This produces a similar problem where the snapshot ends up replacing the head version because they have the same name.
  • When triggering live updates based on websocket events, it would try to update a collection to a specific portable data hash, but this also ran into the issue of confusing the snapshot with the head version.
  • when running update(), or on flush() after the user made a change to the collection, it did not update the internal copy of the collection record with the copy it got back from the update operation, as a result the special ".arvados#collection" was always stale
  • When multiple files are being flushed at once, the commit_all() part of flush() can overlap, which can result in double attempts to delete a buffer block

Fixes:

  • Filter snapshots from group.contents response
  • Only invalidate directories based on websocket events (to be updated on next access), stop trying to be clever about applying incremental changes or updating to specific PDHs
  • Change collection_record_file to use a callback, the callback flushes and then gets the latest collection record from Collection object
  • Be more strategic about when to invalidate the collection_record_file
  • Looking up ".arvados#collection" invalidates it and forces a refresh
  • If a buffer block doesn't exist and we try to delete it, don't throw an error

developer-run-tests: #2899

I did some manual testing and this version is definitely more reliable.

Just one note:

  • in services/fuse/arvados_fuse/__init__.py around line 480 a whole section of code is commented out, should it just be removed?

LGTM otherwise. Thanks for the new comments!

Actions #8

Updated by Peter Amstutz almost 3 years ago

  • Status changed from In Progress to Resolved
Actions #9

Updated by Peter Amstutz almost 3 years ago

  • Release set to 49
Actions

Also available in: Atom PDF