Project

General

Profile

Actions

Bug #23010

closed

Salt installer fails when deploying to Ubuntu 24.04

Added by Lucas Di Pentima 9 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Story points:
-
Release relationship:
Auto

Description

Since #22972 we have switched back to use upstream PostgreSQL packages by default on our salt installer. This is convenient because we can request different client library versions that the stock packages provide, useful for RDS backed clusters.

When deploying to Ubuntu 24.04 currently there's a lack of support on the postgres-formula, but that's fixed by using the ubuntu2404 branch from GitHub.

With the above mentioned fix, the arvados-api-server package fails to install, with little information about what happens:

...                                                                          
Compilation finished!                                                                                                               
The Phusion Passenger(R) agent is already installed.                                                                                
If you want to redownload it, re-run this program with the --force parameter.

 --> Installing Nginx 1.26.2 engine
Creating symlinks to configuration in /etc/arvados/api ...... done.
Extending systemd unit configuration ...... done.
Checking configuration for completeness...WARN[2025-07-01T17:33:30.855442790Z] deprecated or unknown config entry: Clusters.xarv3.Vo
lumes.xarv3-nyw5e-000000000000000.DriverParameters.IAMRole 
WARN[2025-07-01T17:33:30.855539423Z] deprecated or unknown config entry: Clusters.xarv3.Workbench.SecretKeyBase 
Defaulting to memory cache, because /var/www/arvados-api/current/tmp/cache does not exist
 done.
NOTE: The arvados-api-server package was not configured completely because
database setup could not be completed.
Please refer to the documentation for next steps:
  <https://doc.arvados.org/install/>

After you do that, resume arvados-api-server setup by running:
  dpkg-reconfigure arvados-api-server
dpkg: error processing package arvados-api-server (--configure):
 installed arvados-api-server package post-installation script subprocess returned error exit status 22
Processing triggers for libc-bin (2.39-0ubuntu8.4) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for install-info (7.1-3build2) ...
Errors were encountered while processing:
 arvados-api-server
needrestart is being skipped since dpkg has failed
[ERROR   ] stderr: Running as unit: run-r51e9817eeb5e4f84bd98f23af4f762cc.scope; invocation ID: 4b176dc8af9848609525aac2e8e6c1d6
E: Sub-process /usr/bin/dpkg returned an error code (1)
[ERROR   ] retcode: 100   
...

This is what happens when running db:migrate:status manually:

root@controller:/var/www/arvados-api/current# bundle3.2 exec bin/rake db:migrate:status                                             
WARN[2025-07-01T18:38:10.385520936Z] deprecated or unknown config entry: Clusters.xarv3.Volumes.xarv3-nyw5e-000000000000000.DriverPa
rameters.IAMRole                                                                                                                    
WARN[2025-07-01T18:38:10.385633581Z] deprecated or unknown config entry: Clusters.xarv3.Workbench.SecretKeyBase                     
Defaulting to memory cache, because /var/www/arvados-api/current/tmp/cache does not exist                                           
rake aborted!                                                                                                                       
ActiveRecord::DatabaseConnectionError: There is an issue connecting with your hostname: 10.1.1.11. (ActiveRecord::DatabaseConnection
Error)                                                                                                                              

Please check your database configuration and ensure there is a valid connection to your database.                                   
/var/www/arvados-api/shared/vendor_bundle/ruby/3.2.0/gems/activerecord-7.1.3.4/lib/active_record/connection_adapters/postgresql_adap
ter.rb:78:in `rescue in new_client'                                                                                                 

...

/var/www/arvados-api/shared/vendor_bundle/ruby/3.2.0/gems/bundler-2.4.22/exe/bundle:29:in `<top (required)>'
/usr/bin/bundle3.2:25:in `load'
/usr/bin/bundle3.2:25:in `<main>' 

Caused by:
PG::ConnectionBad: connection to server at "10.1.1.11", port 5432 failed: Connection refused (PG::ConnectionBad)
        Is the server running on that host and accepting TCP/IP connections?

...

Tasks: TOP => db:migrate:status
(See full trace by running task with --trace)

Checking what's listening:

root@controller:/var/www/arvados-api/current# netstat -anlp | grep LISTEN
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      13047/nginx: master 
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      13047/nginx: master 
tcp        0      0 127.0.0.1:12345         0.0.0.0:*               LISTEN      5315/alloy          
tcp        0      0 127.0.0.54:53           0.0.0.0:*               LISTEN      339/systemd-resolve 
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN      57779/postgres      
tcp        0      0 0.0.0.0:5433            0.0.0.0:*               LISTEN      57775/postgres      
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      339/systemd-resolve 
tcp6       0      0 :::22                   :::*                    LISTEN      1/init              
tcp6       0      0 :::9187                 :::*                    LISTEN      5829/prometheus-pos 
tcp6       0      0 :::9100                 :::*                    LISTEN      6629/prometheus-nod 
tcp6       0      0 :::5433                 :::*                    LISTEN      57775/postgres      
tcp6       0      0 :::3903                 :::*                    LISTEN      5845/mtail              

There are 2 PG instances, one for v15 and another for v16. That seems to be what is causing the problem.


Subtasks 1 (0 open1 closed)

Task #23031: Review 23010-installer-jammy-pg-fix (badly named branch, should have been *-noble-pg-fix)ResolvedLucas Di Pentima08/06/2025Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #22972: salt-installer fails when trying to download pgdg-keyring package from postgresql.orgResolvedLucas Di PentimaActions
Actions #1

Updated by Lucas Di Pentima 9 months ago

  • Related to Bug #22972: salt-installer fails when trying to download pgdg-keyring package from postgresql.org added
Actions #2

Updated by Brett Smith 9 months ago

Are full Salt logs available? I'm wondering whether the Salt installer is installing Ubuntu's PostgreSQL server (15), which gets port 5432, and then the upstream package (16), which gets port 5433. If this happens, different tools might be talking to different databases and that's why the RailsAPI postinst doesn't find the configuration it expects.

Actions #3

Updated by Lucas Di Pentima 9 months ago

  • Target version set to Development 2025-07-23
Actions #4

Updated by Brett Smith 9 months ago

  • Assigned To set to Lucas Di Pentima
Actions #5

Updated by Brett Smith 9 months ago

  • Subtask #23031 added
Actions #6

Updated by Lucas Di Pentima 8 months ago

  • Status changed from New to In Progress
Actions #7

Updated by Brett Smith 8 months ago

  • Target version changed from Development 2025-07-23 to Development 2025-08-06
Actions #8

Updated by Lucas Di Pentima 8 months ago

23010-installer-jammy-pg-fix @ 1e4e6159a7

test-provision: #1248

  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • Yes
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • No
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • Automated single-node provision tests and manually tested on ubuntu2404 multi-node.
  • The tested code incorporates recent main branch changes.
    • Yes
  • New or changed UI/UX has gotten feedback from stakeholders.
    • N/A
  • Documentation has been updated.
    • N/A
  • Behaves appropriately at the intended scale (describe intended scale).
    • No change in scale
  • Considered backwards and forwards compatibility issues between client and server.
    • N/A
  • Follows our coding standards and GUI style guidelines.
    • N/A
  • Updated the provision script to fetch the latest changes on postgresql-formula which include Ubuntu 24.04 support
  • Updated postgresql.sls pillar to not install postgresql-contrib because it depends on whichever official PostgreSQL version any given distro uses, and salt installer requests PG v15 from upstream packages.
  • Re-enabled the test-provision-ubuntu2404 Jenkins job.
Actions #9

Updated by Brett Smith 8 months ago

Lucas Di Pentima wrote in #note-8:

23010-installer-jammy-pg-fix @ 1e4e6159a7

This LGTM, thanks.

Just a note for future readers: this branch name is just a slip-up, we are in fact dealing with Ubuntu 24.04 noble, not 22.04 jammy.

Also for posterity, I asked myself why we're currently installing postgresql-contrib. git grep postgresql-contrib returns the upgrade notes and a database migration indicate it's to support trigram indexes. Ansible does not install this package, so we apparently don't need it, or else RailsAPI wouldn't work on Jenkins and single-node installs. On Debian 12, dpkg -L postgresql-15 | grep trgm confirms that this extension has moved to the main package. And then checking for backwards compatibility, packages.ubuntu.com says that's true in Ubuntu 22.04 as well, which is the oldest distribution that Arvados 3.2.0 will support. So we seem to be covered across the board, and great, happy to stop installing a package we don't need anymore.

Actions #10

Updated by Lucas Di Pentima 8 months ago

Brett Smith wrote in #note-9:

Also for posterity, I asked myself why we're currently installing postgresql-contrib. git grep postgresql-contrib returns the upgrade notes and a database migration indicate it's to support trigram indexes. Ansible does not install this package, so we apparently don't need it, or else RailsAPI wouldn't work on Jenkins and single-node installs. On Debian 12, dpkg -L postgresql-15 | grep trgm confirms that this extension has moved to the main package. And then checking for backwards compatibility, packages.ubuntu.com says that's true in Ubuntu 22.04 as well, which is the oldest distribution that Arvados 3.2.0 will support. So we seem to be covered across the board, and great, happy to stop installing a package we don't need anymore.

Thanks for the detailed investigation on compatibility. This is already merged.

Actions #11

Updated by Lucas Di Pentima 8 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF