Project

General

Profile

Actions

Bug #22762

open

Stop compiling our own Passenger agent

Added by Brett Smith 12 months ago. Updated 3 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
-
Story points:
-
Release relationship:
Auto

Description

When we switched our RailsAPI package from using a standalone Passenger to using our bundled Passenger gem, we implemented this by compiling the Passenger agent. Passenger provides an option to download a prebuilt binary, but we saw it segfaulting on some platforms, so we switched to compiling for more reliability. See 5c6a35e17f2997c3f494741226e914e9578c2ad6.

But this suuuuuuccccks. Even on a t3.xlarge with 4 cores and 16GiB RAM, compiling the Passenger agent takes several minutes. If you have less than 8GiB you're very likely to swap-thrash your machine to death, and I'm not even confident about 8GiB.

Research the issue more and see whether there are alternatives. Are there bug reports about the segfaulting agents? Are they limited to specific distros? Do we know why? Even if there's some way to identify when we're on a buggy platform and only compile then, otherwise download, that would be an improvement.


Subtasks 1 (1 open0 closed)

Task #23340: Review 22762-passenger-prebuilt-binariesIn ProgressLucas Di Pentima12/08/2025Actions
Actions #1

Updated by Brett Smith 10 months ago

Alternatively, Lucas noted elsewhere:

I wonder if we can speed passenger's compilation process by asking the compiler to use all available cores. I looked and it was using just 1 (of 16 total).

This could improve the build time without requiring us to figure out which prebuilt binaries we do or don't trust.

Actions #2

Updated by Lucas Di Pentima 4 months ago · Edited

  • Target version set to Development 2025-11-26
  • Assigned To set to Lucas Di Pentima
  • Status changed from New to In Progress

I haven't found an official way of making passenger's build system to run in multi-core, but I think we have a good middle-ground solution.
One of the suggestion passenger's docs give is to use of ccache for developers who want to recompile passenger faster.

I think this could also help us given that the most annoying experience is in production system upgrades.

I have done some testing and I think this would be a good option to still compile our own passenger binary but with minimal recompilation added times:

  • bundle exec passenger-config compile-agent --auto --optimize
    • Without ccache: 2:17 minutes
    • With ccache: 16 secs
  • apt install arvados-api-server
    • Without ccache: 3:33 minutes
    • With ccache: 1:22 minutes

The ccache tool keeps its cache at ~/.cache/ccache/, and to use it all it took was to set the following envvars:

export CC="ccache gcc" 
export CXX="ccache g++" 

Updates to implement this at 62bc45ed0a - 22762-passenger-build-ccache branch.

Actions #3

Updated by Brett Smith 4 months ago · Edited

I'd like to have a conversation about this. I'm writing up my initial reaction to the idea. I don't mean for this to be the final word, I'm just saying where I'm coming from.

For most of our users in production, by your own numbers, the lengthy build means they spend about ten minutes per release doing builds (~5 minutes per cluster with one development cluster and one user cluster). Say we release about once a quarter, so they spend ten minutes every three months. This is annoying and not ideal, but it's not the worst problem in the world, which is part of why we're in the situation we're currently in.

I know it rapidly gets much more annoying if you're working on a RailsAPI deployment, either at the packaging level or at the ops level. The five-minute build makes the deploy-and-test cycle very long. Trust me, I know, it was a pain while I did #23000 recently. But, that's not the common case, so I'm not inclined to add complexity to the production case to optimize the development case.

Given all this, adding a whole new dependency and spending disk space (how big is ~/.cache/ccache after one build?) just to save six minutes per quarter doesn't feel a good trade-off to make to me. It's already a problem for us that the RailsAPI install process is too complicated, with a lot of careful coordination that happens across both the package build and the postinst script. Adding another dependency to that process, with more things that vary per distro and can go wrong, doesn't feel like it justifies the time savings to me.

Just one example: ccache is only available in the EPEL repo in RHEL. If we're lucky, "all" this means is we need to add an --enablerepo option to our install docs and testing scripts. If we're unlucky, we have to document how to set up EPEL.

I'm open to a wide range of solutions that improve the situation using the tools we currently have. For example:

  • At base, it's probably worth re-checking whether prebuilt binaries work reliably with current versions of Passenger. There have been a few minor releases since I wrote that comment in the postinst script, and if they work this is easily the least work.
  • Are there maybe different flavors of prebuilt binary available? Maybe we can request one that, e.g., prioritizes compatibility over speed, and that's more likely to work across our supported distros.
  • GNU make itself supports a MAKEFLAGS variable with options to pass to it. Could we try setting j in that in the RailsAPI postinst?

Anything along these lines seems better to me than adding more complexity to the install process.

Actions #4

Updated by Brett Smith 4 months ago

  • Target version changed from Development 2025-11-26 to Development 2025-12-10
Actions #5

Updated by Lucas Di Pentima 4 months ago

Brett Smith wrote in #note-3:

Just one example: ccache is only available in the EPEL repo in RHEL. If we're lucky, "all" this means is we need to add an --enablerepo option to our install docs and testing scripts. If we're unlucky, we have to document how to set up EPEL.

Oh yeah, it seems I've failed in correctly checking the package availability on all supported distributions, I was operating under the assumption there was no additional thing to do to install ccache. So that's out of the window then.

Just for reference, I've done the same timing test on an EC2 c5.4xlarge instance (like the ones we use in production) and I got around 4 minutes of total time that takes installing the arvados-api-server, while it takes one third of the time if we avoid compiling passenger.
My main objective was not to save 2 or 3 minutes per quarter per cluster, because that would be on the happy path. Instead, I was aiming to make things easier for the ops people when they need to upgrade a production cluster that's difficult to get a maintenance window scheduled. When issues arise (and they happen from time to time), you enter in a similar loop as in the development case, with the difference that migrations might be in a in-between state making things not work, etc.
It's even worse on big production cluster where there are load balanced RailsAPI servers.

I'm open to a wide range of solutions that improve the situation using the tools we currently have. For example:

  • At base, it's probably worth re-checking whether prebuilt binaries work reliably with current versions of Passenger. There have been a few minor releases since I wrote that comment in the postinst script, and if they work this is easily the least work.

That could be an option, if we're comfortable with trusting pre-built binaries being downloaded.

  • Are there maybe different flavors of prebuilt binary available? Maybe we can request one that, e.g., prioritizes compatibility over speed, and that's more likely to work across our supported distros.
  • GNU make itself supports a MAKEFLAGS variable with options to pass to it. Could we try setting j in that in the RailsAPI postinst?

MAKEFLAGS doesn't seem to make a difference.

Actions #6

Updated by Brett Smith 4 months ago

  • Subtask #23340 added
Actions #7

Updated by Lucas Di Pentima 4 months ago · Edited

Coming back to this: I've changed back the postinst.sh script so that it downloads the prebuilt passenger binary

Branch 22762-passenger-prebuilt-binaries at 17b0c2a3eef68

Actions #8

Updated by Brett Smith 4 months ago

Lucas Di Pentima wrote in #note-7:

Branch 22762-passenger-prebuilt-binaries at 17b0c2a3eef68

I would really like to test on RHEL 8, since that's the distro we support with the oldest glibc, and we've seen other prebuilt components (gems) start to leave it behind. You could orchestrate that by doing the following:

  1. Start a Rocky 8 VM (local or cloud, up to you)
  2. Build a test cluster config.yml—You could do this by swiping the code that test-ansible does to do it.
  3. Install and configure postgresql-server (the arvados_postgresql and arvados_database Ansible roles cover what you need to do)
  4. Install your development arvados-api-server

Literally all you need to do is get the discovery document. If you can do that then I think this is good.

Actions #9

Updated by Brett Smith 3 months ago

  • Release set to 82
  • Target version deleted (Development 2025-12-10)

We will defer this until after we have enough of #23359 done to support testing.

Actions

Also available in: Atom PDF