Feature #18071
closedUse postgresql advisory locks to prevent concurrent dispatcher / keep-balance processes
100%
Updated by Tom Clegg over 2 years ago
- Assigned To set to Tom Clegg
- Target version set to 2022-09-28 sprint
Use source:lib/controller/dblock module, see #18339.
Updated by Peter Amstutz over 2 years ago
This should be the kind of lock that allows the new process can elbow out the old process -- I'm thinking of the situation where we start a new "something" and want it to replace the old "something".
So we want to communicate:
- To the new process that it is now allowed to take over
- To the old process it should release the lock and shut down
Updated by Peter Amstutz over 2 years ago
On second thought, that might be a bad idea, because it could lead to two processes fighting over the lock instead of one getting it and the other failing.
The one that fails to get the lock, perhaps it could stay up health check reports it in a "failed to get lock" state? Also, can the lock record some information about the node that does have the lock.
Updated by Peter Amstutz over 2 years ago
When lock is acquired, record hostname + process id
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-09-28 sprint to 2022-10-12 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-10-12 sprint to 2022-10-26 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-10-26 sprint to 2022-11-09 sprint
Updated by Tom Clegg about 2 years ago
18071-dblock-keep-balance-and-dispatch @ 15043a6825ecd62ccb2272025384474a235b30cc -- developer-run-tests: #3346
- only one keep-balance service (sweep, sleep, repeat) runs at a time
- only one keep-balance sweep runs at a time (if you run "keep-balance -once" in a terminal while a keep-balance server process is already running, they will take turns nicely)
- only one dispatcher (crunch-dispatch-slurm, arvados-dispatch-lsf, arvados-dispatch-cloud) runs at a time
- when waiting for a lock, logs indicate the host/port of the database connection of the client that currently has the lock
Updated by Lucas Di Pentima about 2 years ago
Just one small suggestion:
- File
services/keep-balance/balance.go
lines 73-75: I think theRun()
usage comment is superfluous and also outdated so we can simply remove it?
The rest LGTM
Updated by Tom Clegg about 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|4529d84afb3549ccb4ae9005a8f64f558c2bbe5c.