Story #17296
Updated by Nico César almost 4 years ago
Given the following restrictions:
* singularity 3.5.2 installed on the compute node.
* config.yml has "ContainerExec=Singularity" "SingularityBinary=/usr/bin/singularity"
* no privileges, setuid or special linux capabilities are needed to run the container (i.e. ping won't work because it needs CAP_NET_RAW)
* When we have a ContainerRequest,
... the resulting Container will run
** /usr/bin/singularity build .... to create the .sif file
** /usr/bin/singularity run ... to run the container using --bind for all the container mounts
** date in realtime the container StdErr and StdOut
** store the outputs in a collection
Before starting the support for singularity we need to define an interface for all Container Executors.
<pre>
type ContainerExecuter interface {
// CheckImageIsLoaded checks if imageID is already in the local environment,
// either something we can reference later in Docker API or similar, or a
// filepath containing the image to be used
CheckImageIsLoaded(imageID ImageID) bool
// LoadImage translates a io.Reader that has a tarball in docker format
// (Created with 'docker save'), and load it to the local
// ContainerExecuter.
//
// Returns an ImageID to be referenced later, note that this could be an image
// identifier or a filepath or something else
LoadImage(containerImage io.Reader) (ImageID, error)
// ImageRemove removes the image loaded using LoadImage()
ImageRemove(imageID ImageID) error
ContainerState() (ContainerState, error)
// CreateContainer will prepare anything in the creation on the container
// env is an array of string "KEY=VALUE" that represents the environment variables
// containerConfig has all the parameters needed to start the container
CreateContainer(containerConfig ContainerConfig) error
// StartContainer will start the container
StartContainer() error
// Kill the container and optionally remove the underlying image returns an
// error if it didn't work (including timeout)
Kill() error
// this is how https://golang.org/pkg/os/exec/#Cmd does it.
StdinPipe() (io.WriteCloser, error)
StdoutPipe() (io.ReadCloser, error)
StderrPipe() (io.ReadCloser, error)
// Wait for the container to finish
Wait() error
// Returns the exit code of the last executed container
ExitCode() (int, error)
}
</pre>
When we run a container on a compute node, we do a container conversion, on the fly, to a SIF file, and run that with singularity instead.
# global option that switches between docker or singularity runner
# container_request runtime parameters flag that chooses between docker and singularity
# crunch-run gets docker tar file from keep (existing docker v2 format images)
# crunch-run converts docker tar file to SIF:
<pre>
$ docker save arvados/jobs:latest > arvados-jobs.latest.tar
$ ls -laF arvados-jobs.latest.tar
-rw-r--r-- 1 ward ward 295209984 Jan 14 17:16 arvados-jobs.latest.tar
$ singularity build arvados-jobs.latest.sif docker-archive://arvados-jobs.latest.tar
INFO: Starting build...
...
</pre>
# crunch-run executes singularity with mount points, stdout/stderr captured to logs
# slurm dispatcher supports singularity
## ideally the backend container runner should be transparent to the dispatcher
# proof of concept will be tested on 9tee4
# assume that user id inside the container will be the same as the crunch-run user (?)
# try to support running containers without setuid, identify specific features that require setuid on singularity binary.
Testing goals / acceptance criteria
# MVP: runs a container
# default value for singularity binary (/usr/bin/singularity) but can be changed from arvados config.yml
# captures stdout/stderr to logs
# can bind-mount arv-mount inside the container
# can bind mount tmp/output directories inside the container
# output files have proper permissions to be read for upload & cleaned up (deleted) by crunch-run
# see if it makes sense to have singularity mock the docker API
# should have similar test coverage of singularity features as exist to the Docker features
For future tickets:
# crunchstat
# memory / CPU constraints
# Save the SIF file in Keep and do something with another Link object to make it findable in the future, for the corresponding docker image. TODO: check if the framework we built in for the docker image format v1 -> v2 could be used here.