Feature #19166
closed
Container shell support for SLURM and LSF dispatchers
Added by Peter Amstutz almost 3 years ago.
Updated about 2 years ago.
Estimated time:
(Total: 0.00 h)
Release relationship:
Auto
Description
Unlike the arvados-dispatch-cloud case, the dispatcher doesn't know which HPC compute node will run the container, and the HPC compute node isn't necessarily even reachable from controller. To work around this, we will make an initial connection in the opposite direction and set up a tunnel.
- crunch-run connects to new controller API arvados/v1/containers/{uuid}/gateway_tunnel, authenticated using the container key (GatewayAuthSecret)
- controller registers its own internalURL as the container’s GatewayAddress, and uses the tunnel to route incoming container_ssh connections to crunch-run through the tunnel
- there can be multiple controller hosts/processes; the container_ssh API on controller A will sometimes need to proxy through the same API on controller B
Related issues
1 (1 open — 0 closed)
- Description updated (diff)
- Target version changed from 2022-07-20 to 2022-06-22 Sprint
- Assigned To set to Tom Clegg
- Related to Story #17207: External access to web services running in containers added
- Description updated (diff)
- Status changed from New to In Progress
- Target version changed from 2022-06-22 Sprint to 2022-07-06
As discussed in chat, TODO: crunch-run should not set up a tunnel if it won't actually be used by controller (i.e., if crunch-run won't be saving the tunnel endpoint in the container record because $GatewayAddress is set).
- Target version changed from 2022-07-06 to 2022-07-20
19166-gateway-tunnel @
dc70bbf9ea15395476107a3b8dff96f754a40998 --
developer-run-tests: #3216
- add
arvados-server dispatch-slurm
subcommand (missed in #18947)
- add
crunch-run -version
- improve some log/debug messages
- fix plumbing so "shell {uuid} echo ok" exits after running, instead of hanging
- tested on 9tee4 using slurm+singularity (works, although it's a bit disconcerting that you land in
root@compute0:~#
because singularity doesn't set up an imaginary hostname inside the container like docker does)
- tested on 9tee4 using lsf+singularity (doesn't work on 9tee4 because firewall rules prohibit outgoing connections from non-root users to 127.0.0.1, and unlike Slurm, LSF on 9tee4 is configured to run crunch-run as the "crunch" user; but the error message shows that the LSF part per se is working)
todo: add an API handler to "GET .../ssh" so an old arvados-client returns a helpful "upgrade your client" error instead of a mysterious "405 method not allowed".
- Target version changed from 2022-07-20 to 2022-08-03 Sprint
Let's go ahead and merge this, otherwise it's going to sit forever. LGTM.
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|c9b8b9b9c78a77dd30b828914c8bee9fa8dcbb90.
Also available in: Atom
PDF