Project

General

Profile

Actions

Idea #7583

closed

[CWL] run CWL workflow runner itself in a crunch job

Added by Peter Amstutz over 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
Story points:
2.0

Description

Currently the arvados cwl runner has to run on a shell node. It should be possible to dispatch CWL workflow runner as a crunch job that submits more crunch jobs.

Considerations:
- May need to access Docker to fetch images externally or run a Dockerfile. The arvados cwl runner can scan the CWL file for Docker dependencies and handle these up front before submitting the workflow; this will require Docker access on the shell node.
- CWL supports arbitrary script engines which may run in separate Docker containers. To support this, dispatch these as jobs and wait for the result (this will be horribly inefficient but there is no other way). Update workflows to use the builtin cwl:JavascriptEngine to avoid the hit.
- The CWL runner will occupy a whole node. This is not such a big deal on the cloud, but is a big problem for on-premise installations with only a small number of compute nodes.


Related issues 1 (0 open1 closed)

Blocks Arvados - Idea #7585: [CWL] Start CWL workflows from workbenchResolvedActions
Actions #1

Updated by Peter Amstutz over 10 years ago

  • Subject changed from [CWL] run CWL workflow runner in crunch to [CWL] run CWL workflow runner itself in a crunch job
Actions #2

Updated by Peter Amstutz over 10 years ago

  • Description updated (diff)
  • Story points set to 2.0
Actions #3

Updated by Brett Smith over 10 years ago

Peter Amstutz wrote:

- The CWL runner will occupy a whole node. This is not such a big deal on the cloud, …

I'm not sure I can agree with this. Some cloud installs will still be resource-constrained for cost or capacity reasons. Even when they're not, the cost of running a compute node for the entire life of a workflow just to do oversight is significant. For this specific component, I think I'd like to spend a little more time hashing out the options and their pros and cons.

Actions #4

Updated by Peter Amstutz over 10 years ago

Brett Smith wrote:

Peter Amstutz wrote:

- The CWL runner will occupy a whole node. This is not such a big deal on the cloud, …

I'm not sure I can agree with this. Some cloud installs will still be resource-constrained for cost or capacity reasons. Even when they're not, the cost of running a compute node for the entire life of a workflow just to do oversight is significant. For this specific component, I think I'd like to spend a little more time hashing out the options and their pros and cons.

Well, obviously I'm not happy about it either. The alternative (other than better job scheduling with crunch v2) is to have a dedicated CWL runner service. This would also mitigate the other considerations I wrote in the description. Actually, that would turns this into a completely different story.

For consideration: https://github.com/common-workflow-language/cwltool-service

Actions #5

Updated by Brett Smith over 10 years ago

Peter Amstutz wrote:

Well, obviously I'm not happy about it either. The alternative (other than better job scheduling with crunch v2) is to have a dedicated CWL runner service. This would also mitigate the other considerations I wrote in the description. Actually, that would turns this into a completely different story.

I understand. We're on the same page about where we are, and what the options are from here. I recognize that this is the only solution that doesn't involve adding or rearchitecting services. But I'm worried that nobody will want to use this solution given the cost. And if I'm right about that, I would rather go through the additional expense of bigger service changes, and have a feature people want; than go through the expense of writing code that nobody uses. Like I said, I think we could use more discussion around this piece of the puzzle specifically.

Actions #6

Updated by Peter Amstutz almost 10 years ago

  • Status changed from New to Resolved

This works now.

Actions

Also available in: Atom PDF