Bug #19917
openIssues rerunning workflows with UsePreemptible changes from true to false
0%
Description
If I run a workflow and it gets stuck on a step after a few steps that run (all using preemptible instances), then I kill the job, mark preemptible to false and rerun, it doesn't rerun the sets that run but still tries to use preemptible nodes.
Examples was in 2xpu4 so not sure I can share wf...but this is what Tom wrote:
"I think I see the bug... the retry-after-cancel logic uses the same scheduling_parameters as the cancelled container, even if the still-active requests that are motivating the retry all say preemptible:false."
Tom: this is not a "container reuse" problem it is a "container retry" problem, it should take the scheduling parameters from the actual outstanding container requests and not the cancelled one.
Updated by Sarah Zaranek about 2 years ago
If you have access wf here https://workbench2.2xpu4.arvadosapi.com/processes/2xpu4-xvhdp-ilx0iftqfyzsyuu
Updated by Peter Amstutz about 2 years ago
- Subject changed from Issues rerunning workflows with UsePreemptible changes from true to false to Issues rerunning workflows with UsePreemptible changes from true to false
- Assigned To set to Tom Clegg
Updated by Peter Amstutz about 2 years ago
- Target version changed from To be groomed to To be scheduled
Updated by Peter Amstutz about 2 years ago
- Target version changed from To be scheduled to 2023-02-01 sprint
Updated by Peter Amstutz about 2 years ago
- Description updated (diff)
- Assigned To set to Brett Smith