Build docker images as part of a workflow » History » Version 4
Brett Smith, 12/14/2022 04:52 PM
| 1 | 1 | Tom Clegg | h1. Build docker images as part of a workflow |
|---|---|---|---|
| 2 | |||
| 3 | (draft) |
||
| 4 | |||
| 5 | 2 | Tom Clegg | h2. Background |
| 6 | 1 | Tom Clegg | |
| 7 | 2 | Tom Clegg | Container images provide a well-defined execution environment for doing reproducible work. As long as the image is runnable by a container engine, a job can be repeated. However, the point of reproducibility isn't just to allow repetition of the same computation -- it's to make it possible to use prior work as the starting point for future work. Much of this opportunity is lost if the provenance trail ends at a binary image. |
| 8 | 1 | Tom Clegg | |
| 9 | 2 | Tom Clegg | Ideally, when a bug is discovered in an analysis tool or library, it should be easy to identify which existing results are affected, and re-run those analyses with the updated software. |
| 10 | 1 | Tom Clegg | |
| 11 | 2 | Tom Clegg | Users should have the option of building container images |
| 12 | 1 | Tom Clegg | * ...as part of a CWL workflow (so they can update the image-building instructions and hit one "re-run" button to see the result) |
| 13 | * ...in Arvados containers (so the build environment is controlled, build logs are saved, etc.) |
||
| 14 | * ...without having docker on the client side (so build-and-run workflows can be initiated from browsers, non-Linux workstations, and shared VM environments) |
||
| 15 | 2 | Tom Clegg | |
| 16 | However, Arvados currently (2022) relies on workstations and shell nodes to build docker images (or download them from external sources) and upload them to Keep before starting a containerized workflow. |
||
| 17 | |||
| 18 | h2. Implementation |
||
| 19 | |||
| 20 | 1. Migrate docker links to collection properties |
||
| 21 | * arv-keepdocker should set collection properties["docker-image-repo-tag"] when adding (already done in #16046, #17508) |
||
| 22 | * arv-keepdocker should set collection properties["docker-image-hash"] |
||
| 23 | * arv-keepdocker should search collections with properties["docker-image-repo-tag"] instead of "docker_image_repo+tag" links |
||
| 24 | * arvados-cwl-runner should search collections with properties["docker-image-repo-tag"] instead of "docker_image_repo+tag" links |
||
| 25 | * RailsAPI "resolve docker image spec to container" code should search collection properties for given repo:tag or hash, instead of searching links |
||
| 26 | * RailsAPI data migration should copy any pre-existing "docker-image-repo+tag" and "-hash" properties from links into collection properties |
||
| 27 | |||
| 28 | 2. Support "pull image" container request |
||
| 29 | 3 | Tom Clegg | * Accept as a special case docker_image="arvados/none" (or empty collection PDH) to mean "builtin command" |
| 30 | 4 | Brett Smith | ** or maybe the special value is "arvados/builtin" or "arvados/none" |
| 31 | 2 | Tom Clegg | * Builtin command @["docker", "pull", "repo:tag"]@ causes crunch-run to run @docker pull@ and save the resulting image @sha256:*.tar@ as the output collection instead of running a container |
| 32 | * @mounts@ hash is expected/required to be empty |
||
| 33 | * @runtime_constraints.API@ is expected/required to be true |
||
| 34 | * @output_path@ is expected/required to be "/" |
||
| 35 | * crunch-run sets output_properties @{"docker-image-hash":"...", "docker-image-repo-tag":"repo:tag"}@ |
||
| 36 | |||
| 37 | 3. arvados-cwl-runner submits a "pull image" container request when needed |
||
| 38 | * i.e., if the requested image is not already available in Keep, and docker is not installed/usable directly (e.g., running in an arvados container) |
||
| 39 | |||
| 40 | 4. Support "build image" container request |
||
| 41 | * Another builtin command: @["docker", "build"]@ |
||
| 42 | * url uses docker syntax to indicate a collection or remote git repo containing Dockerfile |
||
| 43 | * @environment@ can be used to pass build args |
||
| 44 | * @mounts@ establishes build context (e.g., mount a collection or git tree at "/") |
||
| 45 | * If Dockerfile is not at the root of build context, use @["docker", "build", "/path/to/Dockerfile"]@ |
||
| 46 | * @output_path@ is expected/required to be "/" |
||
| 47 | |||
| 48 | h2. TBD |
||
| 49 | |||
| 50 | How do we avoid the situation of copying & modifying an image collection, and unwittingly leaving the properties in place, causing the modified collection to be used unintentionally? |
||
| 51 | |||
| 52 | For a @docker pull@ request, should @runtime_constraints@ be automatic (site configurable), or should the client specify? (Consider the case of pulling a 2 GiB image from dockerhub.) |