Project

General

Profile

Actions

Idea #22390

open

Add arv CLI implementation to the Python SDK

Added by Peter Amstutz over 1 year ago. Updated 4 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
-
Target version:
-
Start date:
Due date:
Story points:
-
Release:
Release relationship:
Auto

Description

Why

This makes installation easier for clients and maintenance easier for us: we want to completely get rid of the arvados-cli gem. This means clients won't have any more need for Ruby. Since we typically use arv as part of cluster setup, this makes that easier too.

Why Python over Go

It's a close call for sure, but a few considerations:

  • Users will definitely need Python for all the Keep client software. If we add arv to the set of things you get with the Python SDK, it's very useful by itself, versus requiring users to have both that and arvados-client to do a useful set of basic operations.
  • The Go pitch of "just download and run a binary" doesn't apply to arvados-client because it links against libfuse. For our specific userbase, it is easier to pipx install arvados-python-client or similar than to deal with distro packages.
  • As a practical consideration, we'd rather allocate our Go development efforts elsewhere right now.

Implementation considerations

Location and invocation

Like all scripts, the main tool should live under source:sdk/python/arvados/commands. I propose we name it arvcli.py to help clarify what it is but I'm open to bikeshedding that.

When the script is ready to replace the current arv command, we will add it to pyproject.toml like this:

[project.scripts]
arv = "arvados.commands.arvcli:FUNCNAME" 

This means that, ultimately, the script must have a function that can run without arguments to start the tool. (It can accept arguments to help with testing but those arguments all have to have default values that make sense for starting the script.) When the tool is ready for interactive testing but not a full replacement, it should be possible to run some more verbose way like python3 arvados/commands/arvcli.py.

Note that our current minimum version is Python 3.10. You cannot use functionality added in Python 3.11 or later.

Taking advantage of argparse

The argparse module in the stdlib provides a lot of functionality to help us implement this that we should lean on.

  • The subcommands in the current arv command should be implemented as subparsers.
  • Arguments should specify type argument to validate strings and turn them into desired data types when appropriate. For instance, we can validate that UUIDs are well-formed, and parse complex JSON like body arguments.
  • Arguments should specify a default argument whenever appropriate so later code can just use the argument value without needing to know how to determine the default itself.
  • When an API method requires a parameter we can set required=True.

Passthrough subcommands

  • The copy subcommand just invokes the Python arv-copy and can stay that way.
  • The ws subcommand invokes arv-ws which provides a live stream of events from websockets.
  • The keep subcommand just invokes the corresponding Python tools arv-ls, arv-get, arv-put, or arv-keepdocker

The tag subcommand invokes the arv-tag subcommand which is another Ruby tool. These are link-based tags, which are generally deprecated in favor of properties. Don't bother implementing this wrapper until very late: if the idea is to get rid of arvados-cli, this tool will go away with it, so maybe we're not wrapping it anymore. Thinking through a migration path is TODO.

Internally implemented subcommands

  • The create subcommand invokes an interactive text editor where the user edits JSON that will be sent as the body in a create API operation for a given resource type. This is pretty simple and could be handy but I forgot it existed and can't remember the last time I used it.
  • The edit subcommand fetches the JSON of an object, invokes an interactive text editor where the user edits the returned JSON, any fields that are changed are as the body in a update API operation for a given resource type.
  • The get subcommand takes an arbitrary UUID and list of fields to select on, determines which API endpoint to use, and prints JSON to stdout. Also simple and could be handy but I also forgot it existed.

Subcommands built from the discovery document

The arv command generates a lot of subcommands dynamically from the resource document. Today most of this happens in source:sdk/cli/bin/arv under the if not subcommands.include? resource block.

We have Python code that similarly works with the discovery document. Parameter and Method in source:sdk/python/discovery2pydoc.py is the most relevant. This is parsing the exact same parts of the discovery document that arv will, except instead of setting up argument parsing it's setting up inspect method signatures.

See also ArvadosResources in source:contrib/arvados-bootstrap/src/arv_bootstrap/seed.py.

If you're interested in generalizing any of this functionality into the SDK, that might be a nice thing to do, but we should agree on the API before doing implementation so we're sure it's maintainable.

API retries

Early on arv needs to create an API client to get its discovery document to generate the subcommands. When it does this, it should instantiate the client with num_retries=0: it's more user-friendly to fail promptly than to spend minutes retrying just to generate --help.

Similarly when we actually execute requests on the user's behalf, we should do so with num_retries=3 or some similarly low number. A little retrying is fine but the user will probably get frustrated if we spend several minutes retrying.

We can consider a --num-retries option to let the user control this. That can be a later development: current arv doesn't worry about this at all.

Invoking an editor

When we invoke an editor for arv create or arv edit, the logic should be as follows:

  • if $VISUAL is set, run that command
  • if $EDITOR is set, run that command
  • if nano is found in $PATH, run that
  • run vi

Do not implement checks besides the ones specified here. If $VISUAL is set to some nonexistent command, that's the user's problem: we should report it and fail. Don't try to outsmart the user and fall back to trying $EDITOR or similar.

Error reporting

If an error is caused by bad user input or environment rather than a programming bug, then it should be reported with a friendly message and no backtrace. This includes all of the following:

  • Can't load the discovery document from the API server
  • Any API error: UUID not found, permission errors, temporary capacity errors, etc.
  • Any network error talking to the API server
  • File read/write errors (e.g., can't read the user's input after editing from arv create or arv edit)

You might consider implementing this by catching these specific errors in sys.excepthook and turning them into nicer messages, but that's not the only way to implement it.


Related issues 3 (1 open2 closed)

Related to Arvados - Feature #23293: Start new Python arv with passthrough subcommandsResolvedZoë MaActions
Related to Arvados - Feature #23330: arvcli has subcommands for Arvados API callsResolvedZoë Ma03/17/2026Actions
Related to Arvados - Feature #23409: arvcli actually executes API subcommandsNewZoë MaActions
Actions #1

Updated by Peter Amstutz over 1 year ago

  • Related to Bug #20841: Migrate internal usage of 'arv' to arvados-client by symlinking 'arv' -> 'arvados-client' added
Actions #2

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 1 year ago

  • Target version set to Future
Actions #5

Updated by Brett Smith 9 months ago

  • Release set to 28
  • Target version deleted (Future)
  • Subject changed from drop-in replacement for 'arv' command written in Go to arvados-client becomes a drop-in replacement for the arv CLI
Actions #6

Updated by Brett Smith 5 months ago

  • Related to deleted (Bug #20841: Migrate internal usage of 'arv' to arvados-client by symlinking 'arv' -> 'arvados-client')
Actions #7

Updated by Brett Smith 5 months ago

  • Description updated (diff)
  • Subject changed from arvados-client becomes a drop-in replacement for the arv CLI to Add arv CLI implementation to the Python SDK
Actions #8

Updated by Brett Smith 5 months ago

  • Related to Feature #23293: Start new Python arv with passthrough subcommands added
Actions #9

Updated by Brett Smith 4 months ago

  • Related to Feature #23330: arvcli has subcommands for Arvados API calls added
Actions #10

Updated by Brett Smith 4 months ago

  • Status changed from New to In Progress
Actions #11

Updated by Brett Smith about 2 months ago

  • Related to Feature #23409: arvcli actually executes API subcommands added
Actions

Also available in: Atom PDF