Story #4685
closed[Crunch] CWL prototype workflow runner in Arvados
Added by Peter Amstutz about 10 years ago. Updated almost 8 years ago.
75%
Updated by Tom Clegg almost 10 years ago
- Target version changed from Arvados Future Sprints to 2015-04-29 sprint
Updated by Peter Amstutz almost 10 years ago
Notes on 4685-crunch-cwl-support at 42008041173b0f7dcc4e90023517767396c64f78
sdk/python/arvados/commands/cwl_job.py
New from-scratch job runner, runs in place of crunch-job when runtime_constraints: { cwl_job: true }
Job record specifies an "executive" process. This is a process which runs for the duration of the job. When the executive process ends, the job ends. The executive process is dispatched from cwl_job and runs in a Docker container on a compute node.
Because CWL supports pulling/loading/building Docker images, and will support pluggable expressions (where the expression engine is another Docker container), the executive process runs with --privileged and supports docker-in-docker. (We may wish to whitelist images which are permitted to run with --privileged). (I have to talk to Ward about how this will interact with compute node firewalls).
The executive process sets up concurrent work by adding tasks to the task queue. The cwl_job runner uses event bus to watch for new tasks, and maintains a set of task slots. When a new task arrives, it is assigned a task slot and dispatched to a compute node to run in a Docker container. The image used to run tasks is different from the image used to run the executive process, and each tasks can use a different image.
The executive process uses event bus to watch for task completion to trigger updating the workflow and scheduling new tasks.
Implemented:
- Running the executive process
- Running tasks and uploading output
- Tests!
Not implemented:
- Logging
- Dispatch to SLURM (currently just runs from Python subprocess) (but probably mostly a matter of adding an extra srun incantation)
- Full featured executive process that actually runs CWL (currently working on adding necessary backend support to CWL workflow engine to tie them together)
Discussion points:
This implementation adds records to "job_tasks" table for scheduling concurrent work. However, to support work reuse, tasks would need to gain many of the fields that already exist on the job record. Instead of adding a bunch of columns job_tasks record, I propose we add a new column "parent_job_uuid" to the jobs table which indicates that a job is "owned" by another job. When "parent_job_uuid" is non-null, crunch-dispatch would ignore the job and not start a job runner. The cwl_job runner would create job records with "parent_job_uuid" set and would be responsible for running them (just like it is for tasks now). The job_tasks table would remain to provide support for the existing crunch-job runner, but would not be used in this scheme. One benefit of this approach is that features like provenance graphs which already work at the job level (but not the task level) will do the right thing when presented with a CWL workflow (which will not be the case if the workflow steps are scheduled as tasks.)
Updated by Tom Clegg over 9 years ago
- Category set to Crunch
Peter Amstutz wrote:
Notes on 4685-crunch-cwl-support at 42008041173b0f7dcc4e90023517767396c64f78
sdk/python/arvados/commands/cwl_job.py
New from-scratch job runner, runs in place of crunch-job when runtime_constraints: { cwl_job: true }
Hm, we discussed that the cwl job runner would be able to run as a regular crunch job on a compute node. What does it need to do on the head node that can't be done from a job process on a worker node?
Job record specifies an "executive" process. This is a process which runs for the duration of the job. When the executive process ends, the job ends. The executive process is dispatched from cwl_job and runs in a Docker container on a compute node.
This part sounds good (except that cwl_job itself isn't here too)...
Because CWL supports pulling/loading/building Docker images, and will support pluggable expressions (where the expression engine is another Docker container), the executive process runs with --privileged and supports docker-in-docker. (We may wish to whitelist images which are permitted to run with --privileged). (I have to talk to Ward about how this will interact with compute node firewalls).
The CWL runner really should submit jobs to Arvados rather than invent its own scheduling/container system.
The executive process sets up concurrent work by adding tasks to the task queue.
This sounds OK, but...
The cwl_job runner uses event bus to watch for new tasks, and maintains a set of task slots. When a new task arrives, it is assigned a task slot and dispatched to a compute node to run in a Docker container. The image used to run tasks is different from the image used to run the executive process, and each tasks can use a different image.
...this part sounds unnecessary. Crunch-dispatch/crunch-job (or their replacements), not the cwl runner, is in charge of dispatching jobs to available worker nodes and notifying via event bus when they finish.
The executive process uses event bus to watch for task completion to trigger updating the workflow and scheduling new tasks.
This part sounds right.
- Dispatch to SLURM (currently just runs from Python subprocess) (but probably mostly a matter of adding an extra srun incantation)
Surely cwl runner should not be jumping into this shallow-looking pool...?
- Full featured executive process that actually runs CWL (currently working on adding necessary backend support to CWL workflow engine to tie them together)
This is the piece we really want here: connect the CWL language/workflow engine to the Arvados job-scheduling system.
This implementation adds records to "job_tasks" table for scheduling concurrent work. However, to support work reuse, tasks would need to gain many of the fields that already exist on the job record. Instead of adding a bunch of columns job_tasks record, I propose we add a new column "parent_job_uuid" to the jobs table which indicates that a job is "owned" by another job. When "parent_job_uuid" is non-null, crunch-dispatch would ignore the job and not start a job runner. The cwl_job runner would create job records with "parent_job_uuid" set and would be responsible for running them (just like it is for tasks now). The job_tasks table would remain to provide support for the existing crunch-job runner, but would not be used in this scheme. One benefit of this approach is that features like provenance graphs which already work at the job level (but not the task level) will do the right thing when presented with a CWL workflow (which will not be the case if the workflow steps are scheduled as tasks.)
AFAICT we just need CWL to stop trying to run jobs itself, and let crunch run them as usual. Keeping the workflow engine separate from the job scheduler is necessary to support multiple choices of workflow engine without losing safe job re-use. (If the workflow engine sets up the environment for each job in a pipeline by itself instead of letting Arvados do it, the job re-use logic has to be copied into every workflow engine, and whether a given job is reusable will depend on which version of which workflow engine was responsible for executing it...)
Updated by Peter Amstutz over 9 years ago
You are conflating two different components here, which is partly my fault because the naming is confusing.
cwl_job should really be named "crunch v2". It doesn't actually do anything specific to CWL, except that its design is motivated by the needs of CWL. It has to run on the head node (or otherwise have access to the underlying scheduling system) because like crunch-job, its responsibility is to actually talk to SLURM to make things happen. This is a clean-sheet rewrite in an attempt to break the logjam with crunch-job. However, this can run alongside crunch-job (crunch dispatch uses the cwl_job flag to decide which one to run) so existing Arvados jobs won't be affected.
cwl_runner will be the actual CWL-specific logic. This runs in a container on a compute node. It only talks to crunch v2 via the tasks table and event bus.
Updated by Peter Amstutz over 9 years ago
Tom Clegg wrote:
Crunch-dispatch/crunch-job (or their replacements), not the cwl runner, is in charge of dispatching jobs to available worker nodes and notifying via event bus when they finish.
That's right, cwl_job (or however we choose to rename it) is a replacement for crunch-job. I didn't call it crunch v2 because we haven't agreed on what crunch v2 actually is. However, the existing model for crunch scripts implemented by crunch-job requires that the Docker image have the Python 2.7 and the Arvados SDK preinstalled, which means we wouldn't be able to run any CWL workflows written for platforms other than Arvados, which would largely defeat the purpose of CWL. So given the choice between severely compromising the goals of CWL, further complicating crunch-job, or writing something new that actually does what we want, uses the current feature set (event bus, git over http) and is designed to be testable, I hope we can agree which one is the right choice.
AFAICT we just need CWL to stop trying to run jobs itself, and let crunch run them as usual. Keeping the workflow engine separate from the job scheduler is necessary to support multiple choices of workflow engine without losing safe job re-use. (If the workflow engine sets up the environment for each job in a pipeline by itself instead of letting Arvados do it, the job re-use logic has to be copied into every workflow engine, and whether a given job is reusable will depend on which version of which workflow engine was responsible for executing it...)
That's right, which is why it would be better to submit every unit of work as a job instead of a task. That is incredibly inefficient with the current crunch-job; under my proposal and using the new cwl_job we wouldn't have that limitation.
Updated by Tom Clegg over 9 years ago
Peter Amstutz wrote:
cwl_job should really be named "crunch v2". It doesn't actually do anything specific to CWL, except that its design is motivated by the needs of CWL. It has to run on the head node (or otherwise have access to the underlying scheduling system) because like crunch-job, its responsibility is to actually talk to SLURM to make things happen. This is a clean-sheet rewrite in an attempt to break the logjam with crunch-job.
This isn't the right process/branch/story for such a rewrite.
cwl_runner will be the actual CWL-specific logic. This runs in a container on a compute node. It only talks to crunch v2 via the tasks table and event bus.
cwl_runner is the part we want in this story.
Sounds like we need to go back to #5623 and write down the content we discussed writing down last week but never did, in particular the list of features needed by cwl_runner (and not currently offered by crunch-job) in order to use regular Arvados jobs to get its work done.
Updated by Tom Clegg over 9 years ago
(aside: #5416 already has good support for running arv-git-httpd in run_test_servers...)
Updated by Peter Amstutz over 9 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
cwl_job should really be named "crunch v2". It doesn't actually do anything specific to CWL, except that its design is motivated by the needs of CWL. It has to run on the head node (or otherwise have access to the underlying scheduling system) because like crunch-job, its responsibility is to actually talk to SLURM to make things happen. This is a clean-sheet rewrite in an attempt to break the logjam with crunch-job.
This isn't the right process/branch/story for such a rewrite.
Ok. I'm happy to call that code "proof of concept" and throw it away if it means there will be some actual progress on improving Crunch in the next sprint. Consider it a spike
I will pull that branch apart into 2-3 separate branches, without cwl_job and redundant arv-git-httpd in run_test_servers.
cwl_runner will be the actual CWL-specific logic. This runs in a container on a compute node. It only talks to crunch v2 via the tasks table and event bus.
cwl_runner is the part we want in this story.
Ok. I will write the initial cwl_runner to submit jobs which use run-command. However, many of the CWL test cases will fail.
Sounds like we need to go back to #5623 and write down the content we discussed writing down last week but never did, in particular the list of features needed by cwl_runner (and not currently offered by crunch-job) in order to use regular Arvados jobs to get its work done.
Note added to #5623 with CWL requirements from Crunch.
Updated by Tom Clegg over 9 years ago
Peter Amstutz wrote:
Ok. I will write the initial cwl_runner to submit jobs which use run-command.
Please, noooo... Better to just write "# crunch should provide X but has no API to ask for it yet"
for now than to use run-command to work around missing features.
However, many of the CWL test cases will fail.
Totally fine. For that matter, fine with me if all test cases fail because they're blocked on Crunch API/implementation. (This story shouldn't expand to include non-trivial Crunch features.)
Note added to #5623 with CWL requirements from Crunch.
Great start. We should be able to turn into an API in time to implement next sprint. (cf. meeting)
Updated by Peter Amstutz over 9 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 9 years ago
- Target version changed from 2015-04-29 sprint to 2015-05-20 sprint
Updated by Brett Smith over 9 years ago
- Target version changed from 2015-05-20 sprint to Arvados Future Sprints
Updated by Peter Amstutz over 8 years ago
- Status changed from In Progress to Resolved
Marking resolved as CWL support is now a "thing" and swiftly moving past prototype phase.
Updated by Tom Morris almost 8 years ago
- Target version deleted (
Arvados Future Sprints)