Bug #13330
closedClean up container input collection naming and properties
100%
Description
Currently container outputs and logs have recognizable names and a specific set of properties, e.g.
"properties": { "type": "output", "container_request": "dhhck-xvhdp-yyrg8t00dwajq45" },
Similar naming and properties should be provided for the temporary input collections which are created by arvados-cwl-runner which currently have no properties and have names of the form "New collection (2018-03-08T08:34:40.636Z)". They should also inherit the intermediate output time-to-live setting.
Updated by Tom Morris over 6 years ago
- Subject changed from Clean up container input naming and properties to Clean up container input collection naming and properties
Updated by Peter Amstutz over 6 years ago
- Tracker changed from Bug to Feature
- Description updated (diff)
Updated by Peter Amstutz over 6 years ago
- Assigned To changed from Peter Amstutz to Fuad Muhic
Updated by Peter Amstutz over 6 years ago
This ticket is about Collection objects which are created by arvados-cwl-runner:
arvados/sdk/cwl/arvados_cwl/
grep --color -nHr -e Collection\( *.py arvcontainer.py:117: vwd = arvados.collection.Collection(api_client=self.arvrunner.api, arvjob.py:55: vwd = arvados.collection.Collection(api_client=self.arvrunner.api, __init__.py:293: final = arvados.collection.Collection(api_client=self.api, pathmapper.py:119: collection = arvados.collection.Collection(api_client=self.arvrunner.api, pathmapper.py:160: c = arvados.collection.Collection(api_client=self.arvrunner.api, pathmapper.py:175: c = arvados.collection.Collection(api_client=self.arvrunner.api, runner.py:288: collection = arvados.collection.Collection(api_client=arvrunner.api,
In each case, the collection should:
- Get an auto-generated name that describes its purpose
- Set the "trash_at" time if the intermediate output time-to-live (ttl) is greater than zero (trash_at set to (now + output ttl))
- Set "type" and "container_request" in the "properties" field of the collection.
- Use
"type": "intermediate"
- Use
"container": current_container["uuid"]
(get the current container usingself.api.containers().current().execute(num_retries=self.num_retries)
, seeArvCwlRunner.set_crunch_output
for an example)
- Use
- You should add "trash_at" and "properties" parameters to
arvados.collection.Collection.save_new()
- Remove the existing behavior of searching for and reusing existing collections. It should always create a new collection with the correct name, trash_at time, and properties.
Updated by Peter Amstutz over 6 years ago
- Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint
Updated by Peter Amstutz over 6 years ago
13330-intermediates-test has a test workflow (and a bugfix), run it with:
arvados-cwl-runner --api=containers --local run_in_single.cwl
Updated by Fuad Muhic over 6 years ago
- Target version changed from 2018-05-09 Sprint to 2018-05-23 Sprint
Updated by Peter Amstutz over 6 years ago
Since this involves updating both the Python SDK and arvados-cwl-runner, can you put the Python SDK changes and a-c-r changes in separate branches, so we can merge the Python SDK changes first? We will need to update the minimum Python SDK version in a-c-r setup.py, and we don't know what that version is until it is merged.
Updated by Tom Morris over 6 years ago
- Target version changed from 2018-05-23 Sprint to 2018-06-06 Sprint
Updated by Peter Amstutz over 6 years ago
- I found a typo, should be
if trash_at:
(this appears in two places)
if storage_classes: body["trash_at"] = trash_at
- Please mention in the documentation string for "properties" that it will replace any existing properties.
- Could you add access methods
get_properties()
andget_trash_at()
to theCollection
class which return the values of those respective fields fromself._api_response
?
Ifself._api_response
isNone
thenget_properties()
should return{}
andget_trash_at()
should returnNone
.
trash_at
should return/accept a Python datetime object instead of/in addition to a string. (use ciso8601 for datetime parsing and conversion).
Updated by Peter Amstutz over 6 years ago
Running tests here https://ci.curoverse.com/job/developer-run-tests/736/
Assuming that passes 13330-collection-save @ 4f0c3d501d19bed5915d5d188598d3a7f1dec7f8 LGTM
Updated by Peter Amstutz over 6 years ago
To follow up, the next step after merging will be to determine the timestamped package version (one way to do that is to look at the build-packages logs on jenkins) and then update the setup.py for arvados-cwl-runner so that the minimum version is the one you just merged.
Updated by Fuad Muhic over 6 years ago
- Target version changed from 2018-06-06 Sprint to 2018-06-20 Sprint
Updated by Fuad Muhic over 6 years ago
I made a change to 13330-collection-save. I updated failing test, and its passing locally when merged with newest master. Could you please rerun tests on Jenkins to verify that.
Updated by Fuad Muhic over 6 years ago
- Target version changed from 2018-06-20 Sprint to 2018-07-03 Sprint
Updated by Fuad Muhic over 6 years ago
My initial idea for tests was to patch save_new method and check if it's called with corrects parameters (in run method of ArvadosJob, ArvadosContainer and ArvPathMapper). That would require a lot of mocks for each run method and tests would be very fragile to any code change, so I ended up just testing get_intermediate_collection_info method instead. If you have any better ideas please let me know.
Updated by Tom Morris over 6 years ago
- Target version changed from 2018-07-03 Sprint to 2018-07-18 Sprint
Updated by Peter Amstutz over 6 years ago
The branch is 13330-cwl-intermediate-collections-cleanup
Updated by Peter Amstutz over 6 years ago
Comments:
- When computing trash_at, please use
datetime.datetime.utcnow()
instead ofdatetime.datetime.now()
- Please merge master, then add current_container to the RuntimeContext object so that you don't have to request the current container on every instance of run()
- Can we avoid defining get_collection_attributes() twice? It could be implemented as a plain function which is called by both ArvPathMapper and ArvadosContainer.
- {"type": "Intermediate"} should be "intermediate" (lower case).
Updated by Fuad Muhic over 6 years ago
13330-cwl-intermediate-collections-cleanup is ready for another look.
test-run: https://ci.curoverse.com/job/developer-run-tests/788/
Updated by Peter Amstutz over 6 years ago
arvcontainer.py:172 vwd.save_new() is missing owner_uuid=self.arvrunner.project_uuid
(this looks like it was a bug in the previous code)
arvjob.py:83 same thing with vwd.save_new()
arvcontainer.py:251 should be setting container_request["output_name"]
using the name of the workflow step, something like "output for step %s" % (self.name)
get_intermediate_collection_info()
should include the name of the step and generate a name like intermediate collection for step %s" % (self.name)
util.py:28
return current_container;
Don't need ;
in python
Updated by Fuad Muhic over 6 years ago
13330-cwl-intermediate-collections-cleanup is updated and ready for review.
test-run: https://ci.curoverse.com/job/developer-run-tests/794/
Updated by Peter Amstutz over 6 years ago
Fuad Muhic wrote:
13330-cwl-intermediate-collections-cleanup is updated and ready for review.
test-run: https://ci.curoverse.com/job/developer-run-tests/794/
This LGTM, thanks!
Updated by Fuad Muhic over 6 years ago
- Status changed from In Progress to Resolved