Project

General

Profile

Actions

Bug #23382

closed

Prevent unnecessary row locking on final containers

Added by Brett Smith 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-
Release relationship:
Auto

Description

We have evidence that when a completed container is reused, we go through row_lock_for_priority_update at least once. This can start getting very expensive when the same container is reused many (100K+) times.

Avoid that: make it so we do not call row_lock_for_priority_update when the associated container is Complete or Cancelled. In these cases, there is no point to propagating a priority change, and therefore there is no need to lock all these rows.

Thinking it all the way through: this strategy does mean that we'll still do potentially a lot of locking if the same container is reused many times while it's still running. However, we have evidence that performance is acceptable when a container is reused ~10K times. So, today, getting to this point would require having more than 10K workflow runners running. I'm not aware we have any users today who have MaxInstances * SupervisorFraction > 10_000, so that's tomorrow's problem.


Subtasks 1 (0 open1 closed)

Task #23383: Review 23382-optimize-container-reuseResolvedTom Clegg01/05/2026Actions
Actions #1

Updated by Brett Smith 3 months ago

  • Description updated (diff)
  • Subject changed from Prevent unnecessary row locking on container reuse to Prevent unnecessary row locking on final containers
Actions #2

Updated by Brett Smith 3 months ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith 3 months ago

  • Subtask #23383 added
Actions #4

Updated by Tom Clegg 3 months ago

Actions #5

Updated by Brett Smith 3 months ago

Tom Clegg wrote in #note-4:

23382-optimize-container-reuse @ 367af1bd07bcc89973bfd4e6a3a3d99a1c17e418 -- developer-run-tests: #4983

Assuming tests pass, LGTM. My one nit-suggestion is, y'know, I always like it when regexps have anchors at the boundary (at least \b) to avoid matching superstrings. But I concede that's an awfully remote concern for these particular regexps.

Thank you especially for the comments in the test, I think that's helpful to clarify what's actually being tested vs. what's sort of checking test integrity.

Actions #6

Updated by Tom Clegg 3 months ago

23382-optimize-container-reuse @ 06a7fda1028f9d3379c602b1b45f7aa2900a59e3 -- developer-run-tests: #4984

Fixes the "update container request with dangling container UUID" case.

Actions #7

Updated by Tom Clegg 3 months ago

There are also two places where we explicitly row-lock a container when selecting it for reuse, to avoid leaving the container un-finalized after a race between container finalization and container reuse. These row locks are unnecessary when the container is already finalized before being selected for reuse.

The row lock statements in the Rails logs look identical to the statements we've seen taking multiple minutes in production when a single container is reused by lots of concurrent workflows.

SELECT "containers".* FROM "containers" WHERE "containers"."id" = $1 LIMIT $2 FOR UPDATE  [["id", 1048559796], ["LIMIT", 1]]

23382-optimize-container-reuse @ d7a6e71bd1cc8a3754abcadfc93d5f1901fcedb7 -- developer-run-tests: #4990

Actions #8

Updated by Lucas Di Pentima 3 months ago

Tom Clegg wrote in #note-7:

23382-optimize-container-reuse @ d7a6e71bd1cc8a3754abcadfc93d5f1901fcedb7 -- developer-run-tests: #4990

This LGTM, thank you!

Actions #9

Updated by Tom Clegg 3 months ago

cherry-picked to 3.2-staging, building 3.2.1~dev20251224022200-1 packages at build-and-publish-rc-packages: #292

Actions #10

Updated by Brett Smith 3 months ago

  • Status changed from In Progress to Resolved
Actions #11

Updated by Brett Smith about 2 months ago

  • Release set to 84
Actions

Also available in: Atom PDF