Idea #22390
openAdd arv CLI implementation to the Python SDK
Description
Why¶
This makes installation easier for clients and maintenance easier for us: we want to completely get rid of the arvados-cli gem. This means clients won't have any more need for Ruby. Since we typically use arv as part of cluster setup, this makes that easier too.
Why Python over Go¶
It's a close call for sure, but a few considerations:
- Users will definitely need Python for all the Keep client software. If we add
arvto the set of things you get with the Python SDK, it's very useful by itself, versus requiring users to have both that andarvados-clientto do a useful set of basic operations. - The Go pitch of "just download and run a binary" doesn't apply to
arvados-clientbecause it links against libfuse. For our specific userbase, it is easier topipx install arvados-python-clientor similar than to deal with distro packages. - As a practical consideration, we'd rather allocate our Go development efforts elsewhere right now.
Implementation considerations¶
Location and invocation¶
Like all scripts, the main tool should live under source:sdk/python/arvados/commands. I propose we name it arvcli.py to help clarify what it is but I'm open to bikeshedding that.
When the script is ready to replace the current arv command, we will add it to pyproject.toml like this:
[project.scripts]
arv = "arvados.commands.arvcli:FUNCNAME"
This means that, ultimately, the script must have a function that can run without arguments to start the tool. (It can accept arguments to help with testing but those arguments all have to have default values that make sense for starting the script.) When the tool is ready for interactive testing but not a full replacement, it should be possible to run some more verbose way like python3 arvados/commands/arvcli.py.
Note that our current minimum version is Python 3.10. You cannot use functionality added in Python 3.11 or later.
Taking advantage of argparse¶
The argparse module in the stdlib provides a lot of functionality to help us implement this that we should lean on.
- The subcommands in the current
arvcommand should be implemented as subparsers. - Arguments should specify
typeargument to validate strings and turn them into desired data types when appropriate. For instance, we can validate that UUIDs are well-formed, and parse complex JSON like body arguments. - Arguments should specify a
defaultargument whenever appropriate so later code can just use the argument value without needing to know how to determine the default itself. - When an API method requires a parameter we can set
required=True.
Passthrough subcommands¶
- The
copysubcommand just invokes the Pythonarv-copyand can stay that way. - The
wssubcommand invokesarv-wswhich provides a live stream of events from websockets. - The
keepsubcommand just invokes the corresponding Python toolsarv-ls,arv-get,arv-put, orarv-keepdocker
The tag subcommand invokes the arv-tag subcommand which is another Ruby tool. These are link-based tags, which are generally deprecated in favor of properties. Don't bother implementing this wrapper until very late: if the idea is to get rid of arvados-cli, this tool will go away with it, so maybe we're not wrapping it anymore. Thinking through a migration path is TODO.
Internally implemented subcommands¶
- The
createsubcommand invokes an interactive text editor where the user edits JSON that will be sent as the body in acreateAPI operation for a given resource type. This is pretty simple and could be handy but I forgot it existed and can't remember the last time I used it. - The
editsubcommand fetches the JSON of an object, invokes an interactive text editor where the user edits the returned JSON, any fields that are changed are as the body in aupdateAPI operation for a given resource type. - The
getsubcommand takes an arbitrary UUID and list of fields to select on, determines which API endpoint to use, and prints JSON to stdout. Also simple and could be handy but I also forgot it existed.
Subcommands built from the discovery document¶
The arv command generates a lot of subcommands dynamically from the resource document. Today most of this happens in source:sdk/cli/bin/arv under the if not subcommands.include? resource block.
We have Python code that similarly works with the discovery document. Parameter and Method in source:sdk/python/discovery2pydoc.py is the most relevant. This is parsing the exact same parts of the discovery document that arv will, except instead of setting up argument parsing it's setting up inspect method signatures.
See also ArvadosResources in source:contrib/arvados-bootstrap/src/arv_bootstrap/seed.py.
If you're interested in generalizing any of this functionality into the SDK, that might be a nice thing to do, but we should agree on the API before doing implementation so we're sure it's maintainable.
API retries¶
Early on arv needs to create an API client to get its discovery document to generate the subcommands. When it does this, it should instantiate the client with num_retries=0: it's more user-friendly to fail promptly than to spend minutes retrying just to generate --help.
Similarly when we actually execute requests on the user's behalf, we should do so with num_retries=3 or some similarly low number. A little retrying is fine but the user will probably get frustrated if we spend several minutes retrying.
We can consider a --num-retries option to let the user control this. That can be a later development: current arv doesn't worry about this at all.
Invoking an editor¶
When we invoke an editor for arv create or arv edit, the logic should be as follows:
- if
$VISUALis set, run that command - if
$EDITORis set, run that command - if
nanois found in$PATH, run that - run
vi
Do not implement checks besides the ones specified here. If $VISUAL is set to some nonexistent command, that's the user's problem: we should report it and fail. Don't try to outsmart the user and fall back to trying $EDITOR or similar.
Error reporting¶
If an error is caused by bad user input or environment rather than a programming bug, then it should be reported with a friendly message and no backtrace. This includes all of the following:
- Can't load the discovery document from the API server
- Any API error: UUID not found, permission errors, temporary capacity errors, etc.
- Any network error talking to the API server
- File read/write errors (e.g., can't read the user's input after editing from
arv createorarv edit)
You might consider implementing this by catching these specific errors in sys.excepthook and turning them into nicer messages, but that's not the only way to implement it.
Updated by Peter Amstutz over 1 year ago
- Related to Bug #20841: Migrate internal usage of 'arv' to arvados-client by symlinking 'arv' -> 'arvados-client' added
Updated by Brett Smith 9 months ago
- Release set to 28
- Target version deleted (
Future) - Subject changed from drop-in replacement for 'arv' command written in Go to arvados-client becomes a drop-in replacement for the arv CLI
Updated by Brett Smith 5 months ago
- Related to deleted (Bug #20841: Migrate internal usage of 'arv' to arvados-client by symlinking 'arv' -> 'arvados-client')
Updated by Brett Smith 5 months ago
- Description updated (diff)
- Subject changed from arvados-client becomes a drop-in replacement for the arv CLI to Add arv CLI implementation to the Python SDK
Updated by Brett Smith 5 months ago
- Related to Feature #23293: Start new Python arv with passthrough subcommands added
Updated by Brett Smith 4 months ago
- Related to Feature #23330: arvcli has subcommands for Arvados API calls added
Updated by Brett Smith about 2 months ago
- Related to Feature #23409: arvcli actually executes API subcommands added