Feature #23359
closedAnsible installer supports RHEL (at least enough to run test-provision)
Description
For every role run by install-arvados-cluster.yml, you will need to:
- Anywhere we install apt pins, install a RHEL equivalent. We don't know how to do this yet, that's a necessary planning prerequisite.
- Anywhere we have an
ansible.builtin.deb822_repositorytask, we'll need to make it conditionalwhen: "ansible_pkg_mgr == 'apt'", then add a correspondingansible.builtin.yum_repositorywithwhen: "ansible_pkg_mgr == 'dnf'". Finding out the parameters will require going to the publisher's documentation and figuring out how their yum repository is laid out. - Change any
ansible.builtin.apttasks to include thedistro_packagesrole OR, in the rare case where the package is only needed on Debian/Ubuntu, guard it withwhen: "ansible_pkg_mgr == 'apt'".- If the package has a different name on RHEL, you must add a translation near the top of
filter_plugins/arvados.py. pkgs.org can be a helpful resource for checking package names across distributions.
- If the package has a different name on RHEL, you must add a translation near the top of
- For managed configuration files, if RHEL puts them at a different location than Debian does, you'll need to make the path conditional.
- Try to avoid making the entire task conditional if possible. Try instead to make just the path conditional with a template like
path: "{{ rhel_path if ansible_pkg_mgr == 'dnf' else deb_path }}". If it's helpful to break things up, you can setvarsin the task that you refer to in these templates. If a task parameter needs to be set in one distro but not another, you can set the special valueomitto avoid setting it when not necessary. Seeroles/arvados_shell/tasks/main.ymlfor some examples of usingomit.
- Try to avoid making the entire task conditional if possible. Try instead to make just the path conditional with a template like
- For systemd services, check the service name on RHEL and similarly templatize it if needed.
It would be easy to start with a branch that just updates the roles that install Arvados packages. Since we don't have different configuration paths or service names, that's just a matter of the first bullet. There could be separate branches for roles that install third-party software like Docker.
Updated by Brett Smith 3 months ago
Things you can do to work on this without actually editing existing code:
- Investigate and write up the RHEL equivalent of apt pins.
- For each Debian repository we use, find and document the RHEL equivalent. The simplest version is just a link that documents how to set up the RHEL repository. The fullest version is writing up an
ansible.builtin.yum_repositorytask. - For third-party systemd services we manage, check if they have different names on RHEL and document those differences.
Updated by Lucas Di Pentima 3 months ago
How to pin packages using dnf¶
I've found dnf's documentation regarding its versionlock plugin, and it seems to be what we need.
As a prerequisite, we need to install the python3-dnf-plugin-versionlock package.
Pins are saved in the /etc/dnf/plugins/versionlock.list file by default. Looking at the versionlock.conf file, it seems we could manage a custom lock list file for our purposes by defining our own section, but I've done some tests and it didn't work. The dnf-versionlock man page also doesn't specify that custom sections can be used.
$ cat /etc/dnf/plugins/versionlock.conf [main] enabled = 1 locklist = /etc/dnf/plugins/versionlock.list
Here's an example of the versionlock.list file:
$ cat /etc/dnf/plugins/versionlock.list # Added lock on Tue Dec 30 15:41:18 2025 bash-0:5.1.8-6.el9_1.* $ sudo dnf versionlock list Extra Packages for Enterprise Linux 9 - x86_64 3.3 kB/s | 2.7 kB 00:00 bash-0:5.1.8-6.el9_1.*
Ansible support¶
There's a community module that can be useful to manage our pins. We'll need to double check if it's available on the ansible version we use.
Updated by Lucas Di Pentima 3 months ago
Third-party package repositories used¶
Docker¶
Docs: https://docs.docker.com/engine/install/rhel/
https://download.docker.com/linux/rhel/docker-ce.repo
Key: https://download.docker.com/linux/rhel/gpg
[docker-ce-stable] name=Docker CE Stable - $basearch baseurl=https://download.docker.com/linux/rhel/$releasever/$basearch/stable enabled=1 gpgcheck=1 gpgkey=https://download.docker.com/linux/rhel/gpg
AMD Rocm¶
Docs: https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-rl.html
https://repo.radeon.com/amdgpu/30.20.1/el/9.6/main/x86_64/
Key: https://repo.radeon.com/rocm/rocm.gpg.key
[amdgpu] name=amdgpu baseurl=https://repo.radeon.com/amdgpu/30.20.1/el/9.6/main/x86_64/ enabled=1 priority=50 gpgcheck=1 gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
Docs: https://rocmdocs.amd.com/projects/install-on-linux/en/latest/install/install-methods/package-manager/package-manager-rl.html
https://repo.radeon.com/rocm/el9/7.1.1/main
Key: https://repo.radeon.com/rocm/rocm.gpg.key
[rocm] name=ROCm 7.1.1 repository baseurl=https://repo.radeon.com/rocm/el9/7.1.1/main enabled=1 priority=50 gpgcheck=1 gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
NVIDIA¶
Docs: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch
Key: https://nvidia.github.io/libnvidia-container/gpgkey
[nvidia-container-toolkit] name=nvidia-container-toolkit baseurl=https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch repo_gpgcheck=1 gpgcheck=0 enabled=1 gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crt
Docs: https://developer.download.nvidia.com/compute/cuda/repos/
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64
Key: https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/D42D0685.pub
[cuda-rhel9-x86_64] name=cuda-rhel9-x86_64 baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64 enabled=1 gpgcheck=1 gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/D42D0685.pub
Grafana¶
Docs: https://grafana.com/docs/grafana/latest/setup-grafana/installation/redhat-rhel-fedora/
https://rpm.grafana.com
Key: https://rpm.grafana.com/gpg.key
[grafana] name=grafana baseurl=https://rpm.grafana.com repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://rpm.grafana.com/gpg.key sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crt
PostgreSQL¶
Docs: https://yum.postgresql.org/rpmchart/ & https://wiki.postgresql.org/wiki/YUM_Installation
https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-9-x86_64/
https://download.postgresql.org/pub/repos/yum/15/redhat/rhel-9-x86_64/
Repo RPMs: https://yum.postgresql.org/repopackages/
Updated by Brett Smith 3 months ago
· Edited
Lucas Di Pentima wrote in #note-6:
I've found dnf's documentation regarding its
versionlockplugin, and it seems to be what we need.
Yeah, this seems fine. I think a good next step would be to write a standalone proof-of-concept playbook that pins Docker version 28 and then installs it on a RHEL distro. Once we know what that looks like, we can plan how to integrate it into the existing roles.
Pins are saved in the
/etc/dnf/plugins/versionlock.listfile by default. Looking at theversionlock.conffile, it seems we could manage a custom lock list file for our purposes by defining our own section, but I've done some tests and it didn't work.
Out of curiosity, did you see some documentation that suggested it should work? Or are you just guessing based on the configuration file format?
Updated by Lucas Di Pentima 3 months ago
Brett Smith wrote in #note-8:
Out of curiosity, did you see some documentation that suggested it should work? Or are you just guessing based on the configuration file format?
Both, but mainly dnf-versionlock man page made me assume that the conf file format allowed being extended:
CONFIGURATION
/etc/dnf/plugins/versionlock.confThe minimal content of conf file should contain main sections with enabled and locklist parameters.
Updated by Lucas Di Pentima 3 months ago
Updates at eb2001a1 - branch 23359-ansible-rhel-docker
This updates the arvados_docker role to install RPM packages when dnf is detected on the system. It installs dnf's versionlock plugin and uses it to set up package pins on the docker packages.
In order to test it, I've used a simplified playbook like this:
- name: Install dnf-versionlock
hosts: all
tasks:
- ansible.builtin.include_role:
name: distro_bootstrap
- name: Install Docker
hosts: all
tasks:
- ansible.builtin.include_role:
name: arvados_docker
The changes make it possible for the packages to be updated within the boundaries set by the pins.
The versionlock pin list gets cleaned up on every run to avoid having multiple pins per package when the pinned version change. Also, when dnf versionlock add ... is run, available matching versions are checked added to the list instead of just the provided wildcard. This allows package updating.
Updated by Brett Smith 3 months ago
- Target version changed from Development 2026-01-06 to Development 2026-01-21
Updated by Brett Smith 3 months ago
- Target version changed from Development 2026-01-21 to Development 2026-02-04
Updated by Brett Smith about 2 months ago
- Target version changed from Development 2026-02-04 to Development 2026-02-18
Updated by Brett Smith about 2 months ago
· Edited
Lucas Di Pentima wrote in #note-10:
Updates at eb2001a1 - branch
23359-ansible-rhel-docker
Functionality-wise this is all fine, and it seems like the pattern should work fine for other repositories too. Thanks for putting all this together. I have two style comments.
First, I suggest roles/arvados_docker/tasks/main.yml can literally just be a single task:
- ansible.builtin.include_tasks: "{{ ansible_pkg_mgr }}.yml"
Then all the tasks for apt can go in apt.yml and all the tasks for dnf can go in dnf.yml without an explicit condition check on each one. I think this would be easier to follow than the current swapping between apt and dnf tasks, and it should be easier to work on in the future.
Second, Ansible playbooks are best when they strive to be as idempotent as possible. The way we delete version locks then re-add them falls short of that goal. Reviewing the dnf_versionlock documentation, it looks like we can do better with the following approach: Start with the same state: present task you have now, and register the result. Then follow up with a cleanup task like this:
- name: Remove stale versionlocks
when: docker_versionlocks.changed
community.general.dnf_versionlock:
state: absent
name: "{{ the list of items in docker_versionlocks.locklist_pre that refer to a package name we manage but are not in the desired set of locks }}"
This is admittedly going to require some fancy list filtering. It might help to stage intermediate lists in vars. If this approach sounds interesting to you but you'd like help writing it, let me know and I'm happy to chip in.
Thanks.
Updated by Brett Smith about 1 month ago
- Target version changed from Development 2026-02-18 to Development 2026-03-04
Updated by Lucas Di Pentima 25 days ago
Updates at 0c214b1af23 (rebased from latest main)
- Split debian-vs-redhat tasks for simplicity.
- Improves stale versionlock entry management.
- Adds support for listing packages with no version spec to be installed, also useful if we need to delete a package pin from versionlock.
I'm not sure this extra complexity is really worth it. Deleting our versionlock entries before adding the current ones seemed to me a super simple and effective way of doing it. I'm not seeing the idempotent issue except for when we decide to remove an entry from the versionlock file.
Updated by Brett Smith 21 days ago
Lucas Di Pentima wrote in #note-16:
Updates at 0c214b1af23 (rebased from latest
main)
- Split debian-vs-redhat tasks for simplicity.
The changes to distro_bootstrap need to be reverted. This role is special because it runs on systems that may not have Ansible's own dependencies installed yet, and therefore it can only expect ansible.builtin.raw to work. Our playbooks run it with gather_facts: no and therefore ansible_facts will not be set. See 87a7bd30812968bbd775d0b3cbafa4e51043384b. Documenting this special case somewhere where it would be helpful would be a welcome improvement.
The dnf_versionlock documentation notes that dnf itself is a requirement. This means it needs the full version, not microdnf that comes in Docker containers. Please add dnf to the list of packages we bootstrap like you did for dnf-plugin-versionlock.
- Adds support for listing packages with no version spec to be installed, also useful if we need to delete a package pin from versionlock.
I feel like this support is part of the reason the code for idempotence was so challenging. Right now I'm inclined to say "YAGNI" to this. I can imagine that at some point in the future we will want to clean up version locks, the requirement makes sense, but I would rather wait and figure out an implementation when that situation actually arises than try to predict what it will look like today. More below.
I'm not sure this extra complexity is really worth it. Deleting our versionlock entries before adding the current ones seemed to me a super simple and effective way of doing it. I'm not seeing the idempotent issue except for when we decide to remove an entry from the versionlock file.
So, let's talk about this. I agree that getting idempotence usually requires more code. And I agree that there are situations where the amount of additional code isn't worth it.
I don't really care about idempotence by itself. I like having idempotence because it means that changes to system configuration are consistent and predictable. So when I think about the trade-off, I tend to think about it less as "idempotence vs. code complexity," but more about "system predictability vs. code complexity." Which is still a trade-off!
A good example of this that I wrote is arvados_nginx_base. The tasks in that role are more complex than I would like. The regexp to modify nginx.conf is truly hairy. But: there's complexity in nginx.conf too. For me personally, I decided it was worth the complexity in Ansible in order to keep nginx.conf simple and predictable. The Ansible code runs at a set time and can be commented a lot to help readers understand. If nginx.conf gets hairy, that lives on the system forever and can be harder to document from an Ansible playbook. But reasonable people can disagree about whether I made the right call.
I agree that the code you have feels like it isn't worth it. I pushed 30c13039da208c6c8231814fe266df4f495a5d69 to try to be a simpler version. (Sorry, I started a separate branch for myself locally, but it was still tied to your branch upstream so now it's there. You don't have to take this commit.) It is based on your version: you'll note a lot of the same Jinja filters and regexp patterns. I think there are two main differences:
- I didn't bother supporting the "no version locked" case, because that created a lot of internal branching that, like I said, I'm not sure we need.
- While testing I found out the result has
specs_toaddwhich documents the locks that we just added. With that, the cleanup task just becomes "remove all locks inlocklist_postthat aren't inspecs_toaddbut affect the same packages inspecs_toadd."
Does this version feel more worth it to you? Feel free to keep building on that commit if so. Thanks.
Updated by Brett Smith 20 days ago
- Target version changed from Development 2026-03-04 to Development 2026-03-18
Updated by Lucas Di Pentima 15 days ago
Brett Smith wrote in #note-17:
The changes to
distro_bootstrapneed to be reverted. This role is special because it runs on systems that may not have Ansible's own dependencies installed yet, and therefore it can only expectansible.builtin.rawto work. Our playbooks run it withgather_facts: noand thereforeansible_factswill not be set. See 87a7bd30812968bbd775d0b3cbafa4e51043384b. Documenting this special case somewhere where it would be helpful would be a welcome improvement.
Done, added a brief comment for the future person that will attempt to edit that file.
The dnf_versionlock documentation notes that dnf itself is a requirement. This means it needs the full version, not
microdnfthat comes in Docker containers. Please adddnfto the list of packages we bootstrap like you did fordnf-plugin-versionlock.
Good catch, thanks.
- While testing I found out the result has
specs_toaddwhich documents the locks that we just added. With that, the cleanup task just becomes "remove all locks inlocklist_postthat aren't inspecs_toaddbut affect the same packages inspecs_toadd."
Does this version feel more worth it to you? Feel free to keep building on that commit if so. Thanks.
Yes, this looks simpler to me too, thank you.
Changes at af8dd5ccca:
- Added
dnfas a package requirement indistro_bootstrap. - Reverted changes made in 3664ed4bc related to
distro_bootstrapconditional use of package managers.
Updated by Brett Smith 15 days ago
Updated by Lucas Di Pentima 15 days ago
I realized there were some ansible changes while this was ongoing, so I've just merged main at e4ffe7e0a5, will merge once tests pass: test-provision-ansible: #26
Updated by Lucas Di Pentima 15 days ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|72d7e312a216862dd240c17122eda9754092c5fe.