Actions
Bug #13500
closedcrunch-dispatch-slurm PG::TRDeadlockDetected: ERROR: deadlock detected
Start date:
Due date:
% Done:
0%
Estimated time:
Story points:
-
Release:
Release relationship:
Auto
Description
We have crunch-dispatch-slurm 1.1.4.20180511205416-1
root@arvados-master-eglyx:~# dpkg -l |grep crunch-dis ii crunch-dispatch-slurm 1.1.4.20180511205416-1 amd64 Dispatch Crunch containers to a SLURM cluster
The logs are getting messages like this about 10 times per minute:
May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL: Process 41291 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 46646. May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 46646 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 41291. May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT: See server log for query details. May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk May 17 15:43:14 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:14 Done monitoring container eglyx-dz642-e6zsrf34kxvm0un May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:19 Error updating container eglyx-dz642-ed9d98nc35cy676 to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR: deadlock detected May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL: Process 41982 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 46644. May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 46644 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 43414. May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 43414 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 41982. May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT: See server log for query details. May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk May 17 15:43:19 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:19 Done monitoring container eglyx-dz642-ed9d98nc35cy676 May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:23 Error updating container eglyx-dz642-su55175dq46ce5n to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR: deadlock detected May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL: Process 46743 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 42203. May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 42203 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 42197. May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 42197 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 46743. May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT: See server log for query details. May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk May 17 15:43:23 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:23 Done monitoring container eglyx-dz642-su55175dq46ce5n May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:34 Error updating container eglyx-dz642-h8w2hpksftit3fx to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR: deadlock detected May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: DETAIL: Process 43623 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 41291. May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 41291 waits for ExclusiveLock on relation 16453 of database 16385; blocked by process 29904. May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: Process 29904 waits for ExclusiveLock on relation 16440 of database 16385; blocked by process 43623. May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: HINT: See server log for query details. May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: > (422: 422 Unprocessable Entity) returned by arvados-api-eglyx.hgi.sanger.ac.uk May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:34 Done monitoring container eglyx-dz642-h8w2hpksftit3fx May 17 15:43:34 arvados-master-eglyx crunch-dispatch-slurm[32079]: 2018/05/17 15:43:34 Error updating container eglyx-dz642-mm5lolu4as26bog to state "Cancelled": arvados API server error: #<PG::TRDeadlockDetected: ERROR: deadlock detected
This does not seem to be having an adverse effect on performance, but I just wanted to report it as it seems to be new as of the latest version of crunch-dispatch-slurm.
Actions