Idea #22580
openarvbox 2.0
Description
The purpose of arvbox is
- to provide a self container developer environment capable of running the entire test suite
- to enable launching a self-contained, auto-configured cluster that is can support integration tests (such as running CWL workflows) and manual testing of components that the end user might interact with such as Workbench and keep-web.
Arvbox has significant overlap with other functionality -- all of which was written after arvbox was created, but the approaches taken by arvbox were not intended to be general purpose, where as these new methods (mostly based around Ansible) are general purpose, and thus could support a new arvbox.
So I'm thinking about how a new iteration of arvbox should work.
Current functional overlap:
- arvbox Dockerfile uses
arvados-server installplus installs some additional packages, butarvados-server installis redundant with the new ansible playbook and will be removed (#22436) - arvbox can launch run-tests, but the "test" environment (set up by run-tests) has entirely separate code from the arvbox scripts that create a "development" environment. having separate binaries depending on how you're running things is a bit confusing.
- arvbox has its own code to configure and launch services, which overlaps with code in
run-tests,sdk/python/tests/run_test_server.py,arvados-server bootand the productionsystemdunits
Provisioning¶
We've agreed to standardize on Ansible for provisioning and configuration, based on giving Ansible an Arvados configuration file and an inventory and then having Ansible use the inventory to provision nodes based on what we want to use them for.
(The previous method of provisioning, arvados-server install is already on its way out).
For "arvbox2" it would be great to be able to offload as much as possible to general purpose Ansible playbooks. If so, then arvbox2 could focus on virtual environment management and knowing how to launch "run-tests.sh" or "launch a development arvados cluster" in those environments.
Launching services¶
As mentioned earlier, we've got a bunch of different approaches for building and launching services.
run-tests has the install/* functions to build each component, and uses sdk/python/tests/run_test_server.py to do some of the configuration and launching.run-tests also contains some logic about which tests require services and which tests don't. Many tests that interact with the test mode API server also have built-in assumptions that the database is populated specifically with the test fixtures defined in services/api/test/fixtures (even tests written in Python or Go).
arvados-server boot is used to start up a partial cluster for the purposes of running Cypress integration tests of Workbench 2. I'm not exactly sure of scope of capabilities it has, except that it clearly knows how to bring up API server and controller.
In production, we use systemd units to launch services.
Virtual environments¶
A big part of what the arvbox shell script (that the user interacts with on the host) is managing the docker container(s), which are brought up with a particular set of command line options to bind-mount various things into the container to make them persistent while being able to tear down the container itself.
One of the reasons for doing it this way was to draw clear lines between what is stateful in the container and what isn't, so if the container environment is modified a certain way that involves changing some part of the file system that isn't preserved, that had better be something that is scripted to be re-configured on the next boot. It keeps us honest.
This brings up questions about what container or VM technology to use. Ones that we have some experience with include:
- Docker (currently used by arvbox)
- systemd-nspawn
- kvm
Other container runners:
- podman
- Singularity (included for completeness)
Docker¶
pros:
- The industry standard
- We have a ton of operational experience with it
- Familiar to lots of other people
cons:
Running systemd inside Docker is notoriously awkward. Because of this, Arvbox uses "runit" which means none of the service scripts for arvbox are particularly useful for any other environment.
If we decided we wanted to use systemd consistently for managing services (whether test/development/production) then we'd need to solve this somehow.
There's a systemd stand-in that does minimal service management:
https://github.com/gdraheim/docker-systemctl-replacement
systemd-nspawn¶
pros:
Presumably already packaged everywhere systemd is used, doesn't require adding external repositories (e.g. Docker community edition).
Simpler than Docker, you give it a root directory representing your container and some configuration for how to run the container.
You get a real init process at PID 1 which runs systemd units as intended.
cons:
Less well known than Docker
Requires additional steps to set up networking to make it easy for the host, container, local network, and Internet to all communicate.
Singularity¶
pros:
Runs applications in userspace, no root access required.
cons:
May not provide the features/additional privileges required to run all the Arvados services.
kvm¶
pros:
Full paravirtualization, runs Linux kernel and a full OS.
Greatest isolation.
Can run a whole desktop in a window.
cons:
Takes longer to start and stop than a container.
On cloud, we'd be running a virtual machine within a virtual machine; nested virtualization may not be possible in some environments (e.g. a quick search suggests it may be possible on GCP but you can't do it on EC2).
Requires additional steps to set up networking to make it easy for the host, VM, local network, and Internet to all communicate.
Abstraction layers¶
libvirt and virsh¶
https://ubuntu.com/server/docs/libvirt
This is the standard interface for kvm, but also supports LXC which is a container technology for Linux that has been around before Docker. However, we have no operational experience with LXC and how it differs from
Vagrant¶
https://github.com/hashicorp/vagrant
Specifically intended to help create developer environments using different conainer/virtualization technologies, but now has an icky "Business Source License".
Updated by Peter Amstutz about 1 year ago
- Position changed from -939566 to -939559
Updated by Peter Amstutz about 1 year ago
- Description updated (diff)
- Subject changed from feature in run-tests that brings up a usable cluster & lets you rebuild/restart individual services similar to arvbox to new method for bringing up an auto-configured, usable cluster in "development" mode & lets you rebuild/restart individual services
Updated by Peter Amstutz about 1 year ago
- Subject changed from new method for bringing up an auto-configured, usable cluster in "development" mode & lets you rebuild/restart individual services to new method for launching a test or development environment which can run tests and bring up an auto-configured, usable cluster in "development" mode
Updated by Brett Smith 10 months ago
- Related to Bug #22934: run cwl conformance tests using 'arvados-server boot' instead of arvbox added
Updated by Brett Smith 10 months ago
- Related to Idea #22939: Standardize `arvados-server boot` for running from source added
Updated by Brett Smith 9 months ago
- Subject changed from new method for launching a test or development environment which can run tests and bring up an auto-configured, usable cluster in "development" mode to arvbox 2.0