Container dispatch » History » Version 18
Tom Clegg, 01/27/2016 03:42 PM
| 1 | 16 | Tom Clegg | h1. Container dispatch |
|---|---|---|---|
| 2 | 2 | Peter Amstutz | |
| 3 | 15 | Tom Clegg | {{toc}} |
| 4 | 9 | Peter Amstutz | |
| 5 | 15 | Tom Clegg | h2. Summary |
| 6 | 1 | Peter Amstutz | |
| 7 | 15 | Tom Clegg | A dispatcher uses available compute resources to execute queued containers. |
| 8 | 1 | Peter Amstutz | |
| 9 | 15 | Tom Clegg | Dispatch is meant to be a small simple component rather than a pluggable framework: e.g., "slurm dispatch" can be a small standalone program, rather than a plugin for a big generic dispatch program. |
| 10 | 12 | Peter Amstutz | |
| 11 | 15 | Tom Clegg | h2. Pseudocode |
| 12 | 1 | Peter Amstutz | |
| 13 | 15 | Tom Clegg | * Notice there is a queued container |
| 14 | * Decide whether the required resources are available to run the container |
||
| 15 | * Lock the container (this avoids races with other dispatch processes) |
||
| 16 | * Translate the container's runtime constraints and priority to instructions for the lower-level scheduler, if any |
||
| 17 | * Invoke the "crunch2 run" executor |
||
| 18 | * When the priority changes on a container taken by this dispatch process, update the lower-level scheduler accordingly (cancel if priority is zero) |
||
| 19 | * If the lower-level scheduler indicates the container is finished or abandoned, but the Container record is locked by this dispatcher and has state=Running, fail the container |
||
| 20 | 1 | Peter Amstutz | |
| 21 | 15 | Tom Clegg | h2. Examples |
| 22 | 1 | Peter Amstutz | |
| 23 | 15 | Tom Clegg | slurm batch mode |
| 24 | * Use "sinfo" to determine whether it is possible to run the container |
||
| 25 | * Submit a batch job to the queue: "echo crunch-run --job {uuid} | sbatch -N1" |
||
| 26 | * When container priority changes, use scontrol and scancel to propagate changes to slurm |
||
| 27 | * Use strigger to run a cleanup script when a container exits |
||
| 28 | 2 | Peter Amstutz | |
| 29 | 15 | Tom Clegg | standalone worker |
| 30 | * Inspect /proc/meminfo, /proc/cpuinfo, "docker ps", etc. to determine local capacity |
||
| 31 | * Invoke crunch-run as a child process (or perhaps a detached daemon process) |
||
| 32 | * Signal crunch-run to stop if container priority changes to zero |
||
| 33 | 2 | Peter Amstutz | |
| 34 | 15 | Tom Clegg | h2. Arvados API support |
| 35 | 2 | Peter Amstutz | |
| 36 | 15 | Tom Clegg | Each dispatch process has an Arvados API token that allows it to see queued containers. |
| 37 | * No two dispatch processes can run at the same time with the same token. One way to achieve this is to make a user record for each dispatch service. |
||
| 38 | 2 | Peter Amstutz | |
| 39 | 15 | Tom Clegg | Container APIs relevant to a dispatch program: |
| 40 | * List Queued containers (might be a subset of Queued containers) |
||
| 41 | * List containers with state=Locked or state=Running associated with current token |
||
| 42 | 18 | Tom Clegg | ** arvados.v1.containers.current (equivalent to @filters=[["dispatch_auth_uuid","=",current_client_auth.uuid]]@) |
| 43 | 15 | Tom Clegg | * Receive event when container is created or modified and state is Queued (it might become runnable) |
| 44 | * Change state Queued->Locked |
||
| 45 | * Change state Locked->Queued |
||
| 46 | * Change state Locked->Running |
||
| 47 | * Change state Running->Complete |
||
| 48 | * Receive event when priority changes |
||
| 49 | 1 | Peter Amstutz | * Receive event when state changes to Complete |
| 50 | 18 | Tom Clegg | * Retrieve an API token to pass into the container and its arv-mount process (via crunch-run) |
| 51 | ** Token is automatically created/assigned when container state changes to Locked |
||
| 52 | ** Token is automatically expired/destroyed when container state changes away from Running |
||
| 53 | ** arvados.v1.api_client_authorizations.get(uuid=container.auth_uuid) |
||
| 54 | 15 | Tom Clegg | * Create events/logs |
| 55 | ** Decided not to run this container |
||
| 56 | ** Decided to run this container (e.g., no node with those resources) |
||
| 57 | ** Lock failed |
||
| 58 | ** Dispatched to crunch-run |
||
| 59 | ** Cleaned up crashed crunch-run (lower-level scheduler indicates the job finished, but crunch-run didn't leave the container in a final state) |
||
| 60 | ** Cleaned up abandoned container (container belongs to this process, but dispatch and lower-level scheduler don't know about it) |
||
| 61 | 6 | Peter Amstutz | |
| 62 | 15 | Tom Clegg | h2. Non-responsibilities |
| 63 | 6 | Peter Amstutz | |
| 64 | 15 | Tom Clegg | Dispatch doesn't retry failed containers. If something needs to be reattempted, a new container will appear in the queue. |
| 65 | 7 | Peter Amstutz | |
| 66 | 15 | Tom Clegg | Dispatch doesn't fail a container that it can't run. It doesn't know whether other dispatchers will be able to run it. |
| 67 | 8 | Peter Amstutz | |
| 68 | 15 | Tom Clegg | h2. Additional notes |
| 69 | 8 | Peter Amstutz | |
| 70 | 17 | Tom Clegg | (see also #6429 and #6518 and #8028) |
| 71 | 8 | Peter Amstutz | |
| 72 | 15 | Tom Clegg | Using websockets to listen for container events (new containers added, priority changes) will benefit from some Go SDK support. |