Actions
Feature #23204
opena-c-r should be able to map an S3 input to a Directory
Status:
New
Priority:
Normal
Assigned To:
-
Category:
CWL
Target version:
-
Story points:
-
Description
I wrote a workflow that took a Directory input and used that under InitialWorkDirRequirement.listing. Then I ran the workflow pointing at an entire S3 bucket. Early on a-c-r logged this error:
INFO Using Arvados credential […]
INFO S3 downloads will use AWS access key id AKIA[…]
INFO Checking Keep for s3://test-curii-brett
DEBUG Found ETag values {}
DEBUG Sending GET request with headers {}
INFO Beginning download of s3://test-curii-brett
WARNING Download error: [Errno 21] Is a directory: ''
Traceback (most recent call last):
File "/opt/arvados-py/lib/python3.11/site-packages/arvados_cwl/pathmapper.py", line 184, in v
results = s3_to_keep(self.arvrunner.api,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/arvados-py/lib/python3.11/site-packages/arvados/_internal/s3_to_keep.py", line 126
return url_to_keep(api, _Downloader(api, get_botoclient(botosession, unsigned_requests)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/arvados-py/lib/python3.11/site-packages/arvados/_internal/to_keep_util.py", line 2
req = downloader.download(url, headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/arvados-py/lib/python3.11/site-packages/arvados/_internal/s3_to_keep.py", line 61,
self.target = self.collection.open(self.name, "wb")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/arvados-py/lib/python3.11/site-packages/arvados/collection.py", line 367, in open
raise IOError(errno.EISDIR, "Is a directory", path)
IsADirectoryError: [Errno 21] Is a directory: ''
The workflow then proceeded to run with an empty directory named d41d8cd98f00b204e9800998ecf8427e+0 (i.e., the empty collection PDH) in the working directory.
IMO a-c-r should recursively download the entire bucket into a collection and stage that. This should also work with a subdirectory inside a bucket.
Actions