Project

General

Profile

Actions

Feature #18071

closed

Use postgresql advisory locks to prevent concurrent dispatcher / keep-balance processes

Added by Tom Clegg over 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
10/31/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto


Subtasks 1 (0 open1 closed)

Task #19507: Review 18071-dblock-keep-balance-and-dispatchResolvedTom Clegg10/31/2022

Actions
Actions #1

Updated by Tom Clegg over 3 years ago

  • Release deleted (20)
Actions #2

Updated by Tom Clegg over 3 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg over 2 years ago

  • Assigned To set to Tom Clegg
  • Target version set to 2022-09-28 sprint
Actions #4

Updated by Peter Amstutz over 2 years ago

This should be the kind of lock that allows the new process can elbow out the old process -- I'm thinking of the situation where we start a new "something" and want it to replace the old "something".

So we want to communicate:

  • To the new process that it is now allowed to take over
  • To the old process it should release the lock and shut down
Actions #5

Updated by Peter Amstutz over 2 years ago

On second thought, that might be a bad idea, because it could lead to two processes fighting over the lock instead of one getting it and the other failing.

The one that fails to get the lock, perhaps it could stay up health check reports it in a "failed to get lock" state? Also, can the lock record some information about the node that does have the lock.

Actions #6

Updated by Peter Amstutz over 2 years ago

When lock is acquired, record hostname + process id

Actions #7

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2022-09-28 sprint to 2022-10-12 sprint
Actions #8

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2022-10-12 sprint to 2022-10-26 sprint
Actions #9

Updated by Peter Amstutz over 2 years ago

  • Category set to API
Actions #10

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2022-10-26 sprint to 2022-11-09 sprint
Actions #11

Updated by Tom Clegg about 2 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Tom Clegg about 2 years ago

18071-dblock-keep-balance-and-dispatch @ 15043a6825ecd62ccb2272025384474a235b30cc -- developer-run-tests: #3346

  • only one keep-balance service (sweep, sleep, repeat) runs at a time
  • only one keep-balance sweep runs at a time (if you run "keep-balance -once" in a terminal while a keep-balance server process is already running, they will take turns nicely)
  • only one dispatcher (crunch-dispatch-slurm, arvados-dispatch-lsf, arvados-dispatch-cloud) runs at a time
  • when waiting for a lock, logs indicate the host/port of the database connection of the client that currently has the lock
Actions #13

Updated by Lucas Di Pentima about 2 years ago

Just one small suggestion:

  • File services/keep-balance/balance.go lines 73-75: I think the Run() usage comment is superfluous and also outdated so we can simply remove it?

The rest LGTM

Actions #14

Updated by Tom Clegg about 2 years ago

  • Status changed from In Progress to Resolved
Actions #15

Updated by Peter Amstutz about 2 years ago

  • Release set to 47
Actions

Also available in: Atom PDF