Crunch2 installation » History » Version 10
Tom Clegg, 06/17/2016 08:26 PM
| 1 | 1 | Tom Clegg | h1. Crunch2 installation |
|---|---|---|---|
| 2 | |||
| 3 | (DRAFT -- when ready, this will move to doc.arvados.org→install) |
||
| 4 | |||
| 5 | 2 | Tom Clegg | {{toc}} |
| 6 | |||
| 7 | h2. Set up a crunch-dispatch service |
||
| 8 | |||
| 9 | Currently, dispatching containers via SLURM is supported. |
||
| 10 | |||
| 11 | 9 | Brett Smith | Install crunch-dispatch-slurm on a node that can submit SLURM jobs. This can be any node appropriately configured to connect to the SLURM controller node. |
| 12 | 2 | Tom Clegg | |
| 13 | <pre><code class="shell"> |
||
| 14 | sudo apt-get install crunch-dispatch-slurm |
||
| 15 | </code></pre> |
||
| 16 | |||
| 17 | 9 | Brett Smith | Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token. |
| 18 | 2 | Tom Clegg | |
| 19 | <pre><code class="shell"> |
||
| 20 | apiserver:~$ cd /var/www/arvados-api/current |
||
| 21 | apiserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb |
||
| 22 | zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz |
||
| 23 | </code></pre> |
||
| 24 | |||
| 25 | 4 | Tom Clegg | Save the token on the dispatch node, in <code>/etc/sv/crunch-dispatch-slurm/env/ARVADOS_API_TOKEN</code> |
| 26 | 2 | Tom Clegg | |
| 27 | 4 | Tom Clegg | Example runit script (@/etc/sv/crunch-dispatch-slurm/run@): |
| 28 | 2 | Tom Clegg | |
| 29 | 1 | Tom Clegg | <pre><code class="shell"> |
| 30 | #!/bin/sh |
||
| 31 | 4 | Tom Clegg | set -e |
| 32 | exec 2>&1 |
||
| 33 | 2 | Tom Clegg | |
| 34 | export ARVADOS_API_HOST=uuid_prefix.your.domain |
||
| 35 | |||
| 36 | exec chpst -e ./env -u crunch crunch-dispatch-slurm |
||
| 37 | </code></pre> |
||
| 38 | |||
| 39 | 6 | Tom Clegg | Example runit logging script (@/etc/sv/crunch-dispatch-slurm/log/run@): |
| 40 | |||
| 41 | <pre><code class="shell"> |
||
| 42 | #!/bin/sh |
||
| 43 | set -e |
||
| 44 | [ -d main ] || mkdir main |
||
| 45 | exec svlogd -tt ./main |
||
| 46 | </code></pre> |
||
| 47 | |||
| 48 | 10 | Tom Clegg | Ensure the @crunch@ user on the dispatch node can run Docker containers on SLURM compute nodes via @srun@ or @sbatch@. Depending on your SLURM installation, this may require that the @crunch@ user exist -- and have the same UID, GID, and home directory -- on the dispatch node and all SLURM compute nodes. |
| 49 | |||
| 50 | For example, this should print "OK" (possibly after some extra status/debug messages from SLURM and docker): |
||
| 51 | |||
| 52 | <pre> |
||
| 53 | crunch@dispatch:~$ srun -N1 docker run busybox echo OK |
||
| 54 | </pre> |
||
| 55 | |||
| 56 | 2 | Tom Clegg | |
| 57 | 3 | Tom Clegg | h2. Install crunch-run on all compute nodes |
| 58 | 1 | Tom Clegg | |
| 59 | 3 | Tom Clegg | <pre><code class="shell"> |
| 60 | sudo apt-get install crunch-run |
||
| 61 | </code></pre> |
||
| 62 | |||
| 63 | 1 | Tom Clegg | h2. Enable cgroup accounting on all compute nodes |
| 64 | |||
| 65 | 4 | Tom Clegg | (This requirement isn't new for crunch2/containers, but it seems to be a FAQ. The Docker install guide mentions it's optional and performance-degrading, so it's not too surprising if people skip it. Perhaps we should say why/when it's a good idea to enable it?) |
| 66 | |||
| 67 | 3 | Tom Clegg | Check https://docs.docker.com/engine/installation/linux/ for instructions specific to your distribution. |
| 68 | |||
| 69 | For example, on Ubuntu: |
||
| 70 | # Update @/etc/default/grub@ to include: <pre> |
||
| 71 | GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" |
||
| 72 | </pre> |
||
| 73 | # @sudo update-grub@ |
||
| 74 | # Reboot |
||
| 75 | 2 | Tom Clegg | |
| 76 | 9 | Brett Smith | h2. Configure Docker |
| 77 | 1 | Tom Clegg | |
| 78 | 4 | Tom Clegg | Unchanged from current docs. |
| 79 | |||
| 80 | 1 | Tom Clegg | h2. Test the dispatcher |
| 81 | 4 | Tom Clegg | |
| 82 | 5 | Tom Clegg | On the dispatch node, monitor the crunch-dispatch logs. |
| 83 | 4 | Tom Clegg | |
| 84 | <pre><code class="shell"> |
||
| 85 | dispatch-node$ tail -F /etc/sv/crunch-dispatch-slurm/log/main/current |
||
| 86 | </code></pre> |
||
| 87 | |||
| 88 | 9 | Brett Smith | On a shell VM, install a Docker image for testing. |
| 89 | 1 | Tom Clegg | |
| 90 | <pre><code class="shell"> |
||
| 91 | 9 | Brett Smith | user@shellvm:~$ arv keep docker busybox |
| 92 | 5 | Tom Clegg | </code></pre> |
| 93 | |||
| 94 | On a shell VM, run a trivial container. |
||
| 95 | |||
| 96 | <pre><code class="shell"> |
||
| 97 | 4 | Tom Clegg | user@shellvm:~$ arv container_request create --container-request '{ |
| 98 | 1 | Tom Clegg | "name": "test", |
| 99 | 4 | Tom Clegg | "state": "Committed", |
| 100 | "priority": 1, |
||
| 101 | 5 | Tom Clegg | "container_image": "busybox", |
| 102 | 8 | Tom Clegg | "command": ["true"], |
| 103 | "output_path": "/out", |
||
| 104 | "mounts": { |
||
| 105 | "/out": { |
||
| 106 | "kind": "tmp", |
||
| 107 | "capacity": 1000 |
||
| 108 | } |
||
| 109 | } |
||
| 110 | 1 | Tom Clegg | }' |
| 111 | </code></pre> |
||
| 112 | 7 | Tom Clegg | |
| 113 | Measures of success: |
||
| 114 | 8 | Tom Clegg | * Dispatcher log entries will indicate it has submitted a SLURM job. |
| 115 | * Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued/running jobs. |
||
| 116 | * After the container finishes, @arv container list --limit 1@ will indicate the outcome: <pre> |
||
| 117 | 7 | Tom Clegg | { |
| 118 | ... |
||
| 119 | "exit_code":0, |
||
| 120 | ... |
||
| 121 | "state":"Complete", |
||
| 122 | ... |
||
| 123 | } |
||
| 124 | </pre> |