Feature #18071
closed
Use postgresql advisory locks to prevent concurrent dispatcher / keep-balance processes
Added by Tom Clegg over 3 years ago.
Updated about 2 years ago.
Estimated time:
(Total: 0.00 h)
Release relationship:
Auto
- Description updated (diff)
- Assigned To set to Tom Clegg
- Target version set to 2022-09-28 sprint
This should be the kind of lock that allows the new process can elbow out the old process -- I'm thinking of the situation where we start a new "something" and want it to replace the old "something".
So we want to communicate:
- To the new process that it is now allowed to take over
- To the old process it should release the lock and shut down
On second thought, that might be a bad idea, because it could lead to two processes fighting over the lock instead of one getting it and the other failing.
The one that fails to get the lock, perhaps it could stay up health check reports it in a "failed to get lock" state? Also, can the lock record some information about the node that does have the lock.
When lock is acquired, record hostname + process id
- Target version changed from 2022-09-28 sprint to 2022-10-12 sprint
- Target version changed from 2022-10-12 sprint to 2022-10-26 sprint
- Target version changed from 2022-10-26 sprint to 2022-11-09 sprint
- Status changed from New to In Progress
18071-dblock-keep-balance-and-dispatch @ 15043a6825ecd62ccb2272025384474a235b30cc -- developer-run-tests: #3346
- only one keep-balance service (sweep, sleep, repeat) runs at a time
- only one keep-balance sweep runs at a time (if you run "keep-balance -once" in a terminal while a keep-balance server process is already running, they will take turns nicely)
- only one dispatcher (crunch-dispatch-slurm, arvados-dispatch-lsf, arvados-dispatch-cloud) runs at a time
- when waiting for a lock, logs indicate the host/port of the database connection of the client that currently has the lock
Just one small suggestion:
- File
services/keep-balance/balance.go
lines 73-75: I think the Run()
usage comment is superfluous and also outdated so we can simply remove it?
The rest LGTM
- Status changed from In Progress to Resolved
Also available in: Atom
PDF