Project

General

Profile

Actions

Bug #19917

open

Issues rerunning workflows with UsePreemptible changes from true to false

Added by Sarah Zaranek almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

If I run a workflow and it gets stuck on a step after a few steps that run (all using preemptible instances), then I kill the job, mark preemptible to false and rerun, it doesn't rerun the sets that run but still tries to use preemptible nodes.

Examples was in 2xpu4 so not sure I can share wf...but this is what Tom wrote:
"I think I see the bug... the retry-after-cancel logic uses the same scheduling_parameters as the cancelled container, even if the still-active requests that are motivating the retry all say preemptible:false."

Tom: this is not a "container reuse" problem it is a "container retry" problem, it should take the scheduling parameters from the actual outstanding container requests and not the cancelled one.


Subtasks 1 (1 open0 closed)

Task #19945: ReviewNewTom Clegg

Actions
Actions

Also available in: Atom PDF