Actions
Story #4127
closed[API] Nodes have a method to request and record shutdowns
Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
07/17/2014
Due date:
07/17/2014
% Done:
0%
Estimated time:
Story points:
3.0
Description
The current Node Manager decides to shut down cloud nodes based on a node record's SLURM state. It's possible that a Node could be shut down shortly after it is allocated work. This isn't a huge loss of compute time, but it does cause a Job failure that can look mysterious at first.
It would be better if the API server provided an atomic way to request and record Node shutdowns. This has a few components:
- Add a method to NodesController that marks a node as "being shut down" if and only if it is not currently running a Job.
- Modify the Node model so that attempts to assign a job to it (setting job_uuid) fails if it's marked as "being shut down."
- Modify crunch-dispatch so that it updates node assignments on the API server, and checks for OK responses, before it begins dispatching work.
- Modify the Node Manager to request shutdowns with the API server, and only proceed after an OK response.
Actions