Feature #23359: Ansible installer supports RHEL (at least enough to run test-provision) - Arvados

Actions

Copy link

Feature #23359

closed

Ansible installer supports RHEL (at least enough to run test-provision)

Added by Brett Smith 3 months ago. Updated 15 days ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Lucas Di Pentima

Category:

Deployment

Target version:

Development 2026-03-18

Story points:

Description

For every role run by install-arvados-cluster.yml, you will need to:

Anywhere we install apt pins, install a RHEL equivalent. We don't know how to do this yet, that's a necessary planning prerequisite.
Anywhere we have an ansible.builtin.deb822_repository task, we'll need to make it conditional when: "ansible_pkg_mgr == 'apt'", then add a corresponding ansible.builtin.yum_repository with when: "ansible_pkg_mgr == 'dnf'". Finding out the parameters will require going to the publisher's documentation and figuring out how their yum repository is laid out.
Change any ansible.builtin.apt tasks to include the distro_packages role OR, in the rare case where the package is only needed on Debian/Ubuntu, guard it with when: "ansible_pkg_mgr == 'apt'".
- If the package has a different name on RHEL, you must add a translation near the top of filter_plugins/arvados.py. pkgs.org can be a helpful resource for checking package names across distributions.
For managed configuration files, if RHEL puts them at a different location than Debian does, you'll need to make the path conditional.
- Try to avoid making the entire task conditional if possible. Try instead to make just the path conditional with a template like path: "{{ rhel_path if ansible_pkg_mgr == 'dnf' else deb_path }}". If it's helpful to break things up, you can set vars in the task that you refer to in these templates. If a task parameter needs to be set in one distro but not another, you can set the special value omit to avoid setting it when not necessary. See roles/arvados_shell/tasks/main.yml for some examples of using omit.
For systemd services, check the service name on RHEL and similarly templatize it if needed.

It would be easy to start with a branch that just updates the roles that install Arvados packages. Since we don't have different configuration paths or service names, that's just a matter of the first bullet. There could be separate branches for roles that install third-party software like Docker.

Subtasks 1 (0 open — 1 closed)

Actions

Copy link

Updated by Brett Smith 3 months ago

Description updated (diff)

Actions

Copy link

Updated by Brett Smith 3 months ago

Subtask #23364 added

Actions

Copy link

Updated by Lucas Di Pentima 3 months ago

Status changed from New to In Progress

Actions

Copy link

Updated by Brett Smith 3 months ago · Edited

Description updated (diff)

Actions

Copy link

Updated by Brett Smith 3 months ago

Things you can do to work on this without actually editing existing code:

Investigate and write up the RHEL equivalent of apt pins.
For each Debian repository we use, find and document the RHEL equivalent. The simplest version is just a link that documents how to set up the RHEL repository. The fullest version is writing up an ansible.builtin.yum_repository task.
For third-party systemd services we manage, check if they have different names on RHEL and document those differences.

Actions

Copy link

Updated by Lucas Di Pentima 3 months ago

How to pin packages using dnf¶

I've found dnf's documentation regarding its versionlock plugin, and it seems to be what we need.

As a prerequisite, we need to install the python3-dnf-plugin-versionlock package.

Pins are saved in the /etc/dnf/plugins/versionlock.list file by default. Looking at the versionlock.conf file, it seems we could manage a custom lock list file for our purposes by defining our own section, but I've done some tests and it didn't work. The dnf-versionlock man page also doesn't specify that custom sections can be used.

$ cat /etc/dnf/plugins/versionlock.conf 
[main]
enabled = 1
locklist = /etc/dnf/plugins/versionlock.list

Here's an example of the versionlock.list file:

$ cat /etc/dnf/plugins/versionlock.list 

# Added lock on Tue Dec 30 15:41:18 2025
bash-0:5.1.8-6.el9_1.*
$ sudo dnf versionlock list
Extra Packages for Enterprise Linux 9 - x86_64                                                      3.3 kB/s | 2.7 kB     00:00    
bash-0:5.1.8-6.el9_1.*

Ansible support¶

There's a community module that can be useful to manage our pins. We'll need to double check if it's available on the ansible version we use.

Actions

Copy link

Updated by Lucas Di Pentima 3 months ago

Third-party package repositories used¶

Docker¶

Docs: https://docs.docker.com/engine/install/rhel/
https://download.docker.com/linux/rhel/docker-ce.repo
Key: https://download.docker.com/linux/rhel/gpg

[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://download.docker.com/linux/rhel/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://download.docker.com/linux/rhel/gpg

AMD Rocm¶

Docs: https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-rl.html
https://repo.radeon.com/amdgpu/30.20.1/el/9.6/main/x86_64/
Key: https://repo.radeon.com/rocm/rocm.gpg.key

[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/30.20.1/el/9.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

Docs: https://rocmdocs.amd.com/projects/install-on-linux/en/latest/install/install-methods/package-manager/package-manager-rl.html
https://repo.radeon.com/rocm/el9/7.1.1/main
Key: https://repo.radeon.com/rocm/rocm.gpg.key

[rocm]
name=ROCm 7.1.1 repository
baseurl=https://repo.radeon.com/rocm/el9/7.1.1/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

NVIDIA¶

Docs: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch
Key: https://nvidia.github.io/libnvidia-container/gpgkey

[nvidia-container-toolkit]
name=nvidia-container-toolkit
baseurl=https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Docs: https://developer.download.nvidia.com/compute/cuda/repos/
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64
Key: https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/D42D0685.pub

[cuda-rhel9-x86_64]
name=cuda-rhel9-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/D42D0685.pub

Grafana¶

Docs: https://grafana.com/docs/grafana/latest/setup-grafana/installation/redhat-rhel-fedora/
https://rpm.grafana.com
Key: https://rpm.grafana.com/gpg.key

[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

PostgreSQL¶

Docs: https://yum.postgresql.org/rpmchart/ & https://wiki.postgresql.org/wiki/YUM_Installation
https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-9-x86_64/
https://download.postgresql.org/pub/repos/yum/15/redhat/rhel-9-x86_64/
Repo RPMs: https://yum.postgresql.org/repopackages/

Actions

Copy link

Updated by Brett Smith 3 months ago · Edited

Lucas Di Pentima wrote in #note-6:

I've found dnf's documentation regarding its versionlock plugin, and it seems to be what we need.

Yeah, this seems fine. I think a good next step would be to write a standalone proof-of-concept playbook that pins Docker version 28 and then installs it on a RHEL distro. Once we know what that looks like, we can plan how to integrate it into the existing roles.

Pins are saved in the /etc/dnf/plugins/versionlock.list file by default. Looking at the versionlock.conf file, it seems we could manage a custom lock list file for our purposes by defining our own section, but I've done some tests and it didn't work.

Out of curiosity, did you see some documentation that suggested it should work? Or are you just guessing based on the configuration file format?

Actions

Copy link

Updated by Lucas Di Pentima 3 months ago

Brett Smith wrote in #note-8:

Out of curiosity, did you see some documentation that suggested it should work? Or are you just guessing based on the configuration file format?

Both, but mainly dnf-versionlock man page made me assume that the conf file format allowed being extended:

CONFIGURATION
/etc/dnf/plugins/versionlock.conf

The minimal content of conf file should contain main sections with enabled and locklist parameters.

Actions

Copy link

#10

Updated by Lucas Di Pentima 3 months ago

Updates at eb2001a1 - branch 23359-ansible-rhel-docker

This updates the arvados_docker role to install RPM packages when dnf is detected on the system. It installs dnf's versionlock plugin and uses it to set up package pins on the docker packages.

In order to test it, I've used a simplified playbook like this:

- name: Install dnf-versionlock
  hosts: all
  tasks:
    - ansible.builtin.include_role:
        name: distro_bootstrap

- name: Install Docker
  hosts: all
  tasks:
    - ansible.builtin.include_role:
        name: arvados_docker

The changes make it possible for the packages to be updated within the boundaries set by the pins.

The versionlock pin list gets cleaned up on every run to avoid having multiple pins per package when the pinned version change. Also, when dnf versionlock add ... is run, available matching versions are checked added to the list instead of just the provided wildcard. This allows package updating.

Actions

Copy link

#11

Updated by Brett Smith 3 months ago

Target version changed from Development 2026-01-06 to Development 2026-01-21

Actions

Copy link

#12

Updated by Brett Smith 3 months ago

Target version changed from Development 2026-01-21 to Development 2026-02-04

Actions

Copy link

#13

Updated by Brett Smith about 2 months ago

Target version changed from Development 2026-02-04 to Development 2026-02-18

Actions

Copy link

#14

Updated by Brett Smith about 2 months ago · Edited

Lucas Di Pentima wrote in #note-10:

Updates at eb2001a1 - branch 23359-ansible-rhel-docker

Functionality-wise this is all fine, and it seems like the pattern should work fine for other repositories too. Thanks for putting all this together. I have two style comments.

First, I suggest roles/arvados_docker/tasks/main.yml can literally just be a single task:

- ansible.builtin.include_tasks: "{{ ansible_pkg_mgr }}.yml"

Then all the tasks for apt can go in apt.yml and all the tasks for dnf can go in dnf.yml without an explicit condition check on each one. I think this would be easier to follow than the current swapping between apt and dnf tasks, and it should be easier to work on in the future.

Second, Ansible playbooks are best when they strive to be as idempotent as possible. The way we delete version locks then re-add them falls short of that goal. Reviewing the dnf_versionlock documentation, it looks like we can do better with the following approach: Start with the same state: present task you have now, and register the result. Then follow up with a cleanup task like this:

- name: Remove stale versionlocks
  when: docker_versionlocks.changed
  community.general.dnf_versionlock:
    state: absent
    name: "{{ the list of items in docker_versionlocks.locklist_pre that refer to a package name we manage but are not in the desired set of locks }}"

This is admittedly going to require some fancy list filtering. It might help to stage intermediate lists in vars. If this approach sounds interesting to you but you'd like help writing it, let me know and I'm happy to chip in.

Thanks.

Actions

Copy link

#15

Updated by Brett Smith about 1 month ago

Target version changed from Development 2026-02-18 to Development 2026-03-04

Actions

Copy link

#16

Updated by Lucas Di Pentima 25 days ago

Updates at 0c214b1af23 (rebased from latest main)

Split debian-vs-redhat tasks for simplicity.
Improves stale versionlock entry management.
Adds support for listing packages with no version spec to be installed, also useful if we need to delete a package pin from versionlock.

I'm not sure this extra complexity is really worth it. Deleting our versionlock entries before adding the current ones seemed to me a super simple and effective way of doing it. I'm not seeing the idempotent issue except for when we decide to remove an entry from the versionlock file.

Actions

Copy link

#17

Updated by Brett Smith 21 days ago

Lucas Di Pentima wrote in #note-16:

Updates at 0c214b1af23 (rebased from latest main)

Split debian-vs-redhat tasks for simplicity.

The changes to distro_bootstrap need to be reverted. This role is special because it runs on systems that may not have Ansible's own dependencies installed yet, and therefore it can only expect ansible.builtin.raw to work. Our playbooks run it with gather_facts: no and therefore ansible_facts will not be set. See 87a7bd30812968bbd775d0b3cbafa4e51043384b. Documenting this special case somewhere where it would be helpful would be a welcome improvement.

The dnf_versionlock documentation notes that dnf itself is a requirement. This means it needs the full version, not microdnf that comes in Docker containers. Please add dnf to the list of packages we bootstrap like you did for dnf-plugin-versionlock.

Adds support for listing packages with no version spec to be installed, also useful if we need to delete a package pin from versionlock.

I feel like this support is part of the reason the code for idempotence was so challenging. Right now I'm inclined to say "YAGNI" to this. I can imagine that at some point in the future we will want to clean up version locks, the requirement makes sense, but I would rather wait and figure out an implementation when that situation actually arises than try to predict what it will look like today. More below.

I'm not sure this extra complexity is really worth it. Deleting our versionlock entries before adding the current ones seemed to me a super simple and effective way of doing it. I'm not seeing the idempotent issue except for when we decide to remove an entry from the versionlock file.

So, let's talk about this. I agree that getting idempotence usually requires more code. And I agree that there are situations where the amount of additional code isn't worth it.

I don't really care about idempotence by itself. I like having idempotence because it means that changes to system configuration are consistent and predictable. So when I think about the trade-off, I tend to think about it less as "idempotence vs. code complexity," but more about "system predictability vs. code complexity." Which is still a trade-off!

A good example of this that I wrote is arvados_nginx_base. The tasks in that role are more complex than I would like. The regexp to modify nginx.conf is truly hairy. But: there's complexity in nginx.conf too. For me personally, I decided it was worth the complexity in Ansible in order to keep nginx.conf simple and predictable. The Ansible code runs at a set time and can be commented a lot to help readers understand. If nginx.conf gets hairy, that lives on the system forever and can be harder to document from an Ansible playbook. But reasonable people can disagree about whether I made the right call.

I agree that the code you have feels like it isn't worth it. I pushed 30c13039da208c6c8231814fe266df4f495a5d69 to try to be a simpler version. (Sorry, I started a separate branch for myself locally, but it was still tied to your branch upstream so now it's there. You don't have to take this commit.) It is based on your version: you'll note a lot of the same Jinja filters and regexp patterns. I think there are two main differences:

I didn't bother supporting the "no version locked" case, because that created a lot of internal branching that, like I said, I'm not sure we need.
While testing I found out the result has specs_toadd which documents the locks that we just added. With that, the cleanup task just becomes "remove all locks in locklist_post that aren't in specs_toadd but affect the same packages in specs_toadd."

Does this version feel more worth it to you? Feel free to keep building on that commit if so. Thanks.

Actions

Copy link

#18

Updated by Brett Smith 20 days ago

Target version changed from Development 2026-03-04 to Development 2026-03-18

Actions

Copy link

#19

Updated by Lucas Di Pentima 15 days ago

Brett Smith wrote in #note-17:

The changes to distro_bootstrap need to be reverted. This role is special because it runs on systems that may not have Ansible's own dependencies installed yet, and therefore it can only expect ansible.builtin.raw to work. Our playbooks run it with gather_facts: no and therefore ansible_facts will not be set. See 87a7bd30812968bbd775d0b3cbafa4e51043384b. Documenting this special case somewhere where it would be helpful would be a welcome improvement.

Done, added a brief comment for the future person that will attempt to edit that file.

The dnf_versionlock documentation notes that dnf itself is a requirement. This means it needs the full version, not microdnf that comes in Docker containers. Please add dnf to the list of packages we bootstrap like you did for dnf-plugin-versionlock.

Good catch, thanks.

While testing I found out the result has specs_toadd which documents the locks that we just added. With that, the cleanup task just becomes "remove all locks in locklist_post that aren't in specs_toadd but affect the same packages in specs_toadd."
Does this version feel more worth it to you? Feel free to keep building on that commit if so. Thanks.

Yes, this looks simpler to me too, thank you.

Changes at af8dd5ccca:

Added dnf as a package requirement in distro_bootstrap.
Reverted changes made in 3664ed4bc related to distro_bootstrap conditional use of package managers.

Actions

Copy link

#20

Updated by Brett Smith 15 days ago

Lucas Di Pentima wrote in #note-19:

Changes at af8dd5ccca:

LGTM, thanks.

Actions

Copy link

#21

Updated by Lucas Di Pentima 15 days ago

I realized there were some ansible changes while this was ongoing, so I've just merged main at e4ffe7e0a5, will merge once tests pass: test-provision-ansible: #26

Actions

Copy link

#22

Updated by Lucas Di Pentima 15 days ago

Status changed from In Progress to Resolved

Applied in changeset arvados|72d7e312a216862dd240c17122eda9754092c5fe.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Watchers (1)

Feature #23359

Ansible installer supports RHEL (at least enough to run test-provision)

Updated by Brett Smith 3 months ago

Updated by Brett Smith 3 months ago

Updated by Lucas Di Pentima 3 months ago

Updated by Brett Smith 3 months ago · Edited

Updated by Brett Smith 3 months ago

Updated by Lucas Di Pentima 3 months ago

How to pin packages using dnf¶

Ansible support¶

Updated by Lucas Di Pentima 3 months ago

Third-party package repositories used¶

Docker¶

AMD Rocm¶

NVIDIA¶

Grafana¶

PostgreSQL¶

Updated by Brett Smith 3 months ago · Edited

Updated by Lucas Di Pentima 3 months ago

Updated by Lucas Di Pentima 3 months ago

Updated by Brett Smith 3 months ago

Updated by Brett Smith 3 months ago

Updated by Brett Smith about 2 months ago

Updated by Brett Smith about 2 months ago · Edited

Updated by Brett Smith about 1 month ago

Updated by Lucas Di Pentima 25 days ago

Updated by Brett Smith 21 days ago

Updated by Brett Smith 20 days ago

Updated by Lucas Di Pentima 15 days ago

Updated by Brett Smith 15 days ago

Updated by Lucas Di Pentima 15 days ago

Updated by Lucas Di Pentima 15 days ago