Feature #20471
closedAdmin CLI for managing dispatcher / cloud VMs
Description
arvados-server instance list
arvados-server instance kill -reason "optional reason" {instanceID|containerUUID}
Updated by Tom Clegg 3 months ago
- Has duplicate Feature #16843: [a-d-c] admin cli added
Updated by Brett Smith 3 months ago
- Target version changed from Future to Development 2026-01-06
- Assigned To set to Tom Clegg
- Category set to Crunch
Updated by Tom Clegg 3 months ago
· Edited
In progress:
20471-instances-list @ 734922787244520846f1b9ff4f6544bb9226915f -- developer-run-tests: #4999
Basic instance list functionality (even less than described on Admin CLI for managing dispatcher / cloud VMs wiki). No tests.
tordo:~# ./arvados-server-20471 instance list -header instance address state idle-behavior config-type provider-type price last-container i-03d59cfcfacf307ff 10.253.254.184 running run c5large c5.large 0.085000 tordo-dz642-r6fz90awybvywr6 i-0df614e93e4170ae7 10.253.254.157 booting run t3small t3.small 0.020800
Updated by Brett Smith 3 months ago
- Target version changed from Development 2026-01-06 to Development 2026-01-21
Updated by Tom Clegg 3 months ago
20471-instances-list @ c7456fd0f8252bc8ae965a567f1528de568a35b3 -- developer-run-tests: #5000
with instance kill, drain, hold, and run.
Updated by Tom Clegg 3 months ago
20471-instances-list @ bb2cd51a5ec20cae9db5d8f815abf95b0296a1a7 -- developer-run-tests: #5001
with doc page
Updated by Tom Clegg 2 months ago
20471-instances-list @ e0a69df653fceaf8b4bd7d8278a64099ac2bebb1 -- developer-run-tests: #5004
with tests
Updated by Tom Clegg 2 months ago
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- Didn't implement the feature "instance kill {container-id} kills whichever instance is running {container-id}". This doesn't seem like the greatest idea to me now, especially given Feature #14922: Run multiple containers concurrently on a single cloud VM is happening.
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- Added tests for list/hold/drain
- Tested list/hold/drain/run manually on tordo
- The tested code incorporates recent main branch changes.
- ✅
- New or changed UI/UX has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- ✅
- Behaves appropriately at the intended scale (describe intended scale).
- ✅ no effect on scale
- Considered backwards and forwards compatibility issues between client and server.
- ✅ n/a
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Brett Smith 2 months ago
Tom Clegg wrote in #note-9:
20471-instances-list @ e0a69df653fceaf8b4bd7d8278a64099ac2bebb1 -- developer-run-tests: #5004
I will keep reading the branch but this test failure looks real and needs to be addressed.
Updated by Brett Smith 2 months ago
Implementation¶
See earlier note about the test failures.
As an admin, the "last-container" header makes me wonder: is this the last container that started? Or completed? It looks like this terminology is already baked into the API and I realize there's some value to staying consistent. At the very least I think the documentation should clarify this. I'm open to other ideas about how to make this clearer.
Documentation¶
The -reason flag should be documented.
I propose laying out the command synopses like:
# arvados-server instance drain <instance-id> [instance-id …]I think a pre block just makes for nicer presentation. I think listing a second instance-id in brackets follows man page conventions a little closer and helps clarify the command takes one or more IDs as arguments.
Fobbing people off to the API documentation for a description of each state is reader unfriendly. IMO you should just copy and paste the descriptions. I would also be fine if you at least made an anchor for each relevant section and linked to that; or made includes describing each state and used them in both places.
It would be nice if the API documentation was updated to link to the new doc page and suggest using arvados-server.
Tests¶
If the new tests fail, they're also reader unfriendly: because we're doing a bunch of not-really-related tests in a large function, and a lot of them have similar assertions, the only real way you have to figure out what failed is to open the source. At the very least, I think you should add Commentfs to each assertion to explain what specifically failed. If you want to go farther like separating out test functions, I wouldn't complain.
Updated by Tom Clegg 2 months ago
Brett Smith wrote in #note-12:
See earlier note about the test failures.
Fixed.
As an admin, the "last-container" header makes me wonder: is this the last container that started? Or completed? It looks like this terminology is already baked into the API and I realize there's some value to staying consistent. At the very least I think the documentation should clarify this. I'm open to other ideas about how to make this clearer.
Last started. Added to management API doc page, and changed "not yet run" to "not yet started" on CLI doc page.
The
-reasonflag should be documented.
Added.
I propose laying out the command synopses like: [...]I think a pre block just makes for nicer presentation. I think listing a second
instance-idin brackets follows man page conventions a little closer and helps clarify the command takes one or more IDs as arguments.
Updated formatting.
Fobbing people off to the API documentation for a description of each state is reader unfriendly. IMO you should just copy and paste the descriptions. I would also be fine if you at least made an anchor for each relevant section and linked to that; or made includes describing each state and used them in both places.
Copied descriptions.
It would be nice if the API documentation was updated to link to the new doc page and suggest using
arvados-server.
Done.
If the new tests fail, they're also reader unfriendly: because we're doing a bunch of not-really-related tests in a large function, and a lot of them have similar assertions, the only real way you have to figure out what failed is to open the source. At the very least, I think you should add Commentfs to each assertion to explain what specifically failed. If you want to go farther like separating out test functions, I wouldn't complain.
The new tests are a sequence of operations with checks at each step. But yes, they are independent of the existing management API test, so I've moved them to a new test func.
Added some inline comments, which is the gocheck-recommended way to add static comment strings.
c.Check(exitcode, check.Equals, 0) // `instance drain` should succeed
c.Check(stdout.String(), check.Equals, ``) // `instance drain` should output nothing
c.Check(stderr.String(), check.Matches, // `instance drain` should show feedback on stderr
20471-instances-list @ 013de101e3531a1947d12cc3f6ca69db77f4cd8f -- developer-run-tests: #5015
Updated by Brett Smith 2 months ago
Tom Clegg wrote in #note-13:
20471-instances-list @ 013de101e3531a1947d12cc3f6ca69db77f4cd8f -- developer-run-tests: #5015
I propose 736ef61a2afe41d76cff953df9c73bee92ad3e4d to make the CLI link in the API docs more prominent. Please take a look and merge either with or without that change. Thanks.
Updated by Tom Clegg 2 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|cc9bc22a4e8146053425a7220cfd893c64847106.