Project

General

Profile

Actions

Bug #4334

closed

[Crunch] crunch-dispatch should not allocate Jobs to nodes in the idle* SLURM state

Added by Brett Smith about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
10/28/2014
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

In SLURM, "state*" means "the node was last known to be in state, but I haven't heard from it in a while." Currently, crunch-dispatch ignores the star. However, a node in the state "idle*" is usually recently crashed, and probably not usable. crunch-dispatch should not schedule work on nodes in this state.

The quick and easy implementation for this story is probably to change 'idle*' to 'down' in our database, instead of lopping off the * and making it the same as idle. No need to worry about other * states, since we only schedule onto idle nodes, so it's ok if 'down*' gets translated to 'down'. That's the safest path.


Subtasks 2 (0 open2 closed)

Task #4450: Review 4334-idle-star-is-downResolved10/28/2014

Actions
Task #4376: Diagnose and fixResolvedPeter Amstutz10/28/2014

Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #4314: [Crunch] Figure out why this job was marked Failed unexpectedlyResolvedPeter Amstutz10/24/2014

Actions
Related to Arvados - Bug #4368: [Crunch] Improve node failure detection and job retry logicClosed10/31/2014

Actions
Actions

Also available in: Atom PDF