Bug #14574
closed
Added by Peter Amstutz about 6 years ago.
Updated almost 6 years ago.
Estimated time:
(Total: 0.00 h)
Release relationship:
Auto
Description
When executing ExpressionTool, it doesn't take the workflow execution lock when it calls the output callback. This is a problem when multiple ExpressionTool jobs are executing in threads.
- Status changed from New to In Progress
- Description updated (diff)
- Assigned To set to Peter Amstutz
The quick fix is to change the default, but best is to fix the underlying problem.
Need to make a note on 1.3.0 release notes about the bug and its workaround, and try to fix it for 1.3.1.
While I'm pretty sure failure to lock the callback from ExpressionTool is a bug, and it could plausibly cause the behavior being reported, I haven't actually been able to reproduce the reported deadlock, so I can't say definitively that this fixes it.
The locking LGTM. How can we test this? Maybe with the original workflow?
Lucas Di Pentima wrote:
The locking LGTM. How can we test this? Maybe with the original workflow?
Yea, I've already tried it with the original workflow, the problem is I haven't been able to reproduce the bug, so it is speculative. There's definitely a race condition that is fixed by this branch, and a race could create the problems we're seeing, but I can't pin it down either way. I can run it a few more times and see what happens.
Peter Amstutz wrote:
Lucas Di Pentima wrote:
The locking LGTM. How can we test this? Maybe with the original workflow?
Yea, I've already tried it with the original workflow, the problem is I haven't been able to reproduce the bug, so it is speculative. There's definitely a race condition that is fixed by this branch, and a race could create the problems we're seeing, but I can't pin it down either way. I can run it a few more times and see what happens.
I re-ran the job (e51c5-xvhdp-g1kjpf3j7zo6ou1) from the original failure report (e51c5-xvhdp-tlnzytroy9m380j). It finished successfully in 2 minutes (all containers reused.)
Running with job reuse isn't exactly the same as running a normal job, so the only other thing I can think of would be to re-run without job reuse, but that's expensive.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF