Bug #6157
closed[Documentation] Explain extra steps needed when compute hostnames are not fooN
100%
Description
background¶
Changing slurm config files, and keeping them synchronized across controller+workers, is a bit painful and can cause race conditions that are annoying to diagnose, so we try to avoid setups where it has to change during normal operation.
"fooN", where N is decimal, lets you write foo[0-199] or foo[000-199] in your slurm config files. Therefore, nodes.ping makes it easy to manage a setup like this. In the API server configuration, you can set assign_node_hostname
to a corresponding format string to so that nodes that ping without a hostname get one set matching the schema, and max_compute_nodes
to make sure it doesn't go over your allocation.
However, in some setups it might be inconvenient/difficult/impossible to use hostnames like "fooN".
improvement¶
Install docs should include a section explaining- Why foo[0-N] is a good idea (see above)
- What to do differently if you use a naming scheme besides string+decimal (e.g., your worker nodes' hostnames are {alice, bob, clay, ...})
We should make the simplifying assumption that the hostnames are assigned manually/OOB, and known in advance. IOW, instead of covering scenarios where slurm config has to change every time a new compute node is turned up, we should just advise against that.
AFAIK, as long as the available/powered-on nodes' hostnames are a subset of the hostnames given in slurm.conf, and no two hosts have the same name, slurm and Arvados should work without any code changes.
Files
Updated by Tom Clegg almost 10 years ago
- Project changed from 35 to Arvados
- Description updated (diff)
- Category set to Documentation
Updated by Brett Smith over 9 years ago
- Target version changed from Bug Triage to 2015-07-22 sprint
Updated by Brett Smith over 9 years ago
- Subject changed from [Documentation] Explain extra steps needed when compute hostnames are not computeN to [Documentation] Explain extra steps needed when compute hostnames are not fooN
- Description updated (diff)
Updating the description to reflect our post-#6156 world. We now support any kind of "fooN" schema, not just computeN, and there's no need to work around the original bug.
Updated by Tom Clegg over 9 years ago
- Target version changed from 2015-07-22 sprint to 2015-08-05 sprint
Updated by Tom Clegg over 9 years ago
Updated by Tom Clegg over 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:e0a1fc70f919741a8ad840dc40cfcc87f2751722.