Idea #22390: Add arv CLI implementation to the Python SDK - Arvados Epics - Arvados

Actions

Copy link

Idea #22390

open

Add arv CLI implementation to the Python SDK

Added by Peter Amstutz over 1 year ago. Updated 4 months ago.

Status:

In Progress

Priority:

Normal

Assigned To:

Target version:

Start date:

Due date:

Story points:

Release:

Epics

Release relationship:

Auto

Description

Why¶

This makes installation easier for clients and maintenance easier for us: we want to completely get rid of the arvados-cli gem. This means clients won't have any more need for Ruby. Since we typically use arv as part of cluster setup, this makes that easier too.

Why Python over Go¶

It's a close call for sure, but a few considerations:

Users will definitely need Python for all the Keep client software. If we add arv to the set of things you get with the Python SDK, it's very useful by itself, versus requiring users to have both that and arvados-client to do a useful set of basic operations.
The Go pitch of "just download and run a binary" doesn't apply to arvados-client because it links against libfuse. For our specific userbase, it is easier to pipx install arvados-python-client or similar than to deal with distro packages.
As a practical consideration, we'd rather allocate our Go development efforts elsewhere right now.

Implementation considerations¶

Location and invocation¶

Like all scripts, the main tool should live under source:sdk/python/arvados/commands. I propose we name it arvcli.py to help clarify what it is but I'm open to bikeshedding that.

When the script is ready to replace the current arv command, we will add it to pyproject.toml like this:

[project.scripts]
arv = "arvados.commands.arvcli:FUNCNAME"

This means that, ultimately, the script must have a function that can run without arguments to start the tool. (It can accept arguments to help with testing but those arguments all have to have default values that make sense for starting the script.) When the tool is ready for interactive testing but not a full replacement, it should be possible to run some more verbose way like python3 arvados/commands/arvcli.py.

Note that our current minimum version is Python 3.10. You cannot use functionality added in Python 3.11 or later.

Taking advantage of argparse¶

The argparse module in the stdlib provides a lot of functionality to help us implement this that we should lean on.

The subcommands in the current arv command should be implemented as subparsers.
Arguments should specify type argument to validate strings and turn them into desired data types when appropriate. For instance, we can validate that UUIDs are well-formed, and parse complex JSON like body arguments.
Arguments should specify a default argument whenever appropriate so later code can just use the argument value without needing to know how to determine the default itself.
When an API method requires a parameter we can set required=True.

Passthrough subcommands¶

The copy subcommand just invokes the Python arv-copy and can stay that way.
The ws subcommand invokes arv-ws which provides a live stream of events from websockets.
The keep subcommand just invokes the corresponding Python tools arv-ls, arv-get, arv-put, or arv-keepdocker

The tag subcommand invokes the arv-tag subcommand which is another Ruby tool. These are link-based tags, which are generally deprecated in favor of properties. Don't bother implementing this wrapper until very late: if the idea is to get rid of arvados-cli, this tool will go away with it, so maybe we're not wrapping it anymore. Thinking through a migration path is TODO.

Internally implemented subcommands¶

The create subcommand invokes an interactive text editor where the user edits JSON that will be sent as the body in a create API operation for a given resource type. This is pretty simple and could be handy but I forgot it existed and can't remember the last time I used it.
The edit subcommand fetches the JSON of an object, invokes an interactive text editor where the user edits the returned JSON, any fields that are changed are as the body in a update API operation for a given resource type.
The get subcommand takes an arbitrary UUID and list of fields to select on, determines which API endpoint to use, and prints JSON to stdout. Also simple and could be handy but I also forgot it existed.

Subcommands built from the discovery document¶

The arv command generates a lot of subcommands dynamically from the resource document. Today most of this happens in source:sdk/cli/bin/arv under the if not subcommands.include? resource block.

We have Python code that similarly works with the discovery document. Parameter and Method in source:sdk/python/discovery2pydoc.py is the most relevant. This is parsing the exact same parts of the discovery document that arv will, except instead of setting up argument parsing it's setting up inspect method signatures.

See also ArvadosResources in source:contrib/arvados-bootstrap/src/arv_bootstrap/seed.py.

If you're interested in generalizing any of this functionality into the SDK, that might be a nice thing to do, but we should agree on the API before doing implementation so we're sure it's maintainable.

API retries¶

Early on arv needs to create an API client to get its discovery document to generate the subcommands. When it does this, it should instantiate the client with num_retries=0: it's more user-friendly to fail promptly than to spend minutes retrying just to generate --help.

Similarly when we actually execute requests on the user's behalf, we should do so with num_retries=3 or some similarly low number. A little retrying is fine but the user will probably get frustrated if we spend several minutes retrying.

We can consider a --num-retries option to let the user control this. That can be a later development: current arv doesn't worry about this at all.

Invoking an editor¶

When we invoke an editor for arv create or arv edit, the logic should be as follows:

if $VISUAL is set, run that command
if $EDITOR is set, run that command
if nano is found in $PATH, run that
run vi

Do not implement checks besides the ones specified here. If $VISUAL is set to some nonexistent command, that's the user's problem: we should report it and fail. Don't try to outsmart the user and fall back to trying $EDITOR or similar.

Error reporting¶

If an error is caused by bad user input or environment rather than a programming bug, then it should be reported with a friendly message and no backtrace. This includes all of the following:

Can't load the discovery document from the API server
Any API error: UUID not found, permission errors, temporary capacity errors, etc.
Any network error talking to the API server
File read/write errors (e.g., can't read the user's input after editing from arv create or arv edit)

You might consider implementing this by catching these specific errors in sys.excepthook and turning them into nicer messages, but that's not the only way to implement it.

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by Peter Amstutz over 1 year ago

Related to Bug #20841: Migrate internal usage of 'arv' to arvados-client by symlinking 'arv' -> 'arvados-client' added

Actions

Copy link

Updated by Peter Amstutz over 1 year ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz over 1 year ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz about 1 year ago

Target version set to Future

Actions

Copy link

Updated by Brett Smith 9 months ago

Release set to 28
Target version deleted (~~Future~~)
Subject changed from drop-in replacement for 'arv' command written in Go to arvados-client becomes a drop-in replacement for the arv CLI

Actions

Copy link

Updated by Brett Smith 5 months ago

Related to deleted (Bug #20841: Migrate internal usage of 'arv' to arvados-client by symlinking 'arv' -> 'arvados-client')

Actions

Copy link

Updated by Brett Smith 5 months ago

Description updated (diff)
Subject changed from arvados-client becomes a drop-in replacement for the arv CLI to Add arv CLI implementation to the Python SDK

Actions

Copy link

Updated by Brett Smith 5 months ago

Related to Feature #23293: Start new Python arv with passthrough subcommands added

Actions

Copy link

Updated by Brett Smith 4 months ago

Related to Feature #23330: arvcli has subcommands for Arvados API calls added

Actions

Copy link

#10

Updated by Brett Smith 4 months ago

Status changed from New to In Progress

Actions

Copy link

#11

Updated by Brett Smith about 2 months ago

Related to Feature #23409: arvcli actually executes API subcommands added

Actions

Copy link

Also available in: Atom PDF

Related to Arvados - Feature #23293: Start new Python arv with passthrough subcommands	Resolved	Zoë Ma		Actions
Related to Arvados - Feature #23330: arvcli has subcommands for Arvados API calls	Resolved	Zoë Ma	03/17/2026	Actions
Related to Arvados - Feature #23409: arvcli actually executes API subcommands	New	Zoë Ma		Actions

Project

General

Profile

Arvados » Arvados Epics

Custom queries

Idea #22390

Add arv CLI implementation to the Python SDK

Why¶

Why Python over Go¶

Implementation considerations¶

Location and invocation¶

Taking advantage of argparse¶

Passthrough subcommands¶

Internally implemented subcommands¶

Subcommands built from the discovery document¶

API retries¶

Invoking an editor¶

Error reporting¶

Updated by Peter Amstutz over 1 year ago

Updated by Peter Amstutz over 1 year ago

Updated by Peter Amstutz over 1 year ago

Updated by Peter Amstutz about 1 year ago

Updated by Brett Smith 9 months ago

Updated by Brett Smith 5 months ago

Updated by Brett Smith 5 months ago

Updated by Brett Smith 5 months ago

Updated by Brett Smith 4 months ago

Updated by Brett Smith 4 months ago

Updated by Brett Smith about 2 months ago