Story #6264
closed
[CWL] Prototype Arvados CWL pipeline runner
Added by Brett Smith almost 10 years ago.
Updated over 9 years ago.
Estimated time:
(Total: 0.00 h)
Description
Implement a tool that acts like arv-run-pipeline-instance that accepts CWL workflows as inputs. It creates and monitors Arvados jobs, continuing until the workflow reaches an end state.
Goals:
- Have an implementation to demonstrate at BOSC.
- Build a base that can be developed into a workflow runner usable under Crunch v2.
Simplifying assumptions:
- For now, the runner only supports an active monitoring mode, akin to arv-run-pipeline-instance's default run mode. No need to support switches like
--submit
or --no-wait
.
- The runner runs on a host where it can start a Docker container to support CWL's scripting engine capabilities—probably an Arvados shell node.
- The workflow is only expected to succeed if it uses Docker images with bare minimum Arvados support. At the very least, they need to have virtualenv installed, which Crunch can use to install an Arvados SDK. Accepting arbitrary Docker images will probably require more support from Crunch v2 so it's out of scope for now.
- Description updated (diff)
- Story points set to 3.0
- Subject changed from [DRAFT] [CWL] Prototype Arvados CWL pipeline runner to [CWL] Prototype Arvados CWL pipeline runner
- Assigned To set to Peter Amstutz
- Status changed from New to In Progress
6264-run-command-task-env @ 15ccbba is good to merge.
One thing I noticed is that if the user specifies a non-string value in task.env (most likely a numeric type), they'll get an apparently random exception from subst.do_substitution
that doesn't really enlighten what the problem is. This same issue exists elsewhere in the code, but this sort of accidental type error seems more likely for task.env than, say, task.stdout. I wanted to flag it, but if you think dealing with it is out of scope for the branch, that's fair.
Thanks.
Reviewing 6264-cwl-runner @ 53755b6
Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.
Suggestions I'd make from here:
- I think I'd have had an easier time following the code if these variables had more descriptive names:
i
in _match
t
and r
in arvExecutor
- (I know this isn't new code, but)
n
and c
in uploadfiles
- Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.
- I think the three-line for loop to copy
self.environment
to task.env
can just become script_paramters["task_env"].update(self.environment)
.
- Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.
Thanks.
- Target version changed from 2015-07-08 sprint to 2015-07-22 sprint
- Story points changed from 3.0 to 1.0
Brett Smith wrote:
Reviewing 6264-cwl-runner @ 53755b6
Considering the circumstances, I think this is good to merge. I would like to see tests, but I figure that's not doable before BOSC. I know you've put a lot of work into manual testing—maybe a quick demo or tour of how that works on Monday would be good.
Suggestions I'd make from here:
- I think I'd have had an easier time following the code if these variables had more descriptive names:
i
in _match
t
and r
in arvExecutor
Fixed.
- (I know this isn't new code, but)
n
and c
in uploadfiles
Didn't fix.
- Putting the reuse command line switches in a mutually exclusive group would make a nice user affordance.
Good idea, done.
- I think the three-line for loop to copy
self.environment
to task.env
can just become script_paramters["task_env"].update(self.environment)
.
Thanks, done.
- Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.
Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.
Thanks
Peter Amstutz wrote:
Brett Smith wrote:
- Why does the regexp in ArvPathMapper capture the path (in a group, natch)? I don't see it doing anything with that, so it looks like needless complexity.
Can't remember, either that's was leftover from copying the regex from somewhere else (where unfortunately the source regex wasn't exactly what I needed...) or I was using it and then refactored the part that needed the path somewhere else. In either case, I removed the capture group.
But what I'm saying is, why match the path characters at all? When all you're doing is a boolean match, ^string.*
is functionally identical to ^string
.
But this is still good to merge, thanks.
- Status changed from In Progress to Resolved
- % Done changed from 67 to 100
Applied in changeset arvados|commit:b92203411f6f6adaef1c2af62495830f13f4fa14.
Also available in: Atom
PDF