Actions
Bug #11531
open[API] clean up stale/conflicting dns data from deleted node records
Status:
New
Priority:
Normal
Assigned To:
-
Category:
API
Target version:
-
Start date:
04/20/2017
Due date:
% Done:
0%
Estimated time:
Story points:
-
Description
Problem scenario:
- node compute100 comes up with ip address = 10.2.3.4
- node record compute100 is deleted
- node compute2 comes up with ip address = 10.2.3.4
- now, compute100 and compute2 both have dns records pointing to 10.2.3.4 ... and slurm is very confused
- any time a new node comes up with ip address 10.2.3.4, things will break ... until eventually 100 nodes come up and the compute100 conf file finally gets updated
- in source:services/api/app/models/node.rb, run dns_server_update in the after_delete hook, too
- in the "at startup, make sure all DNS entries exist" block we run at startup, check and fix other out-of-sync conditions too:
- read the content of each existing conf file, and run dns_server_update() if it doesn't match the current IP address in the database. We only have a template for writing, not for parsing, so this can be implemented as a "skip update if existing content is identical" flag passed to dns_server_update().
- check for extra
"#{hostname}.conf"
files left over from a previous config where max_compute_nodes was larger than it is now. Start at N=max_compute_nodes; increase N until"#{hostname_for_slot(N)}.conf"
does not exist; then count back down, deleting the files, untilN < max_compute_nodes
or a node actually exists with slot number N. (Deleting in this order avoids situations like "1..128 and 196..256 exist, but 129..195 do not exist", and thereby ensures it's possible to detect excess conf files in a finite number of steps regardless of how unusual theassign_node_hostname
config is.)
Updated by Tom Morris over 7 years ago
- Target version set to Arvados Future Sprints
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
Arvados Future Sprints)
Actions