Actions
Bug #9236
closed[Node manager] Race between wishlist and arvados node status update boots extra nodes
Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-
Description
I think there's a node manager race condition involving a job appearing in the wishlist and then assigned to an idle node, so then the node is marked as 'busy'.
If the sequence of events is -> poll wishlist -> poll arvados nodes it'll see it as (wishlist 1, busy 1) instead of (wishlist 0, busy 1). This causes it to boot an extra node, because it thinks it needs to fulfill the wishlist.
When nodes are allocated, the job is recorded in the job_uuid field. So we can look at the job queue and see if any of the jobs are actually already allocated to nodes so as to not count them in the wishlist.
Actions