Feature #23409: arvcli actually executes API subcommands - Arvados

Right now as in #23330, after CLI parsing, the method parameters are gathered in the flat namespace returned by parse_known_args(). That namespace contains a bunch of things in addition to the method parameters (including the method itself and "global" parameters such as dry_run, format etc.). It'll make the API method call easier to write and understand, if we can easily get just the method parameters out of it.

For this, I think there's a way to do it which I find fairly appealing. First, when we add the method parameter arguments to the "sub-subparser", we explicitly use the dest keyword argument in add_argument(), in pseudocode:

method_subsubparser.add_argument(..., dest=f"method_param.{parameter_key}", ...)

where "method_param" is just a prefix (the value can be anything) to group the method parameters, and parameter_key is the actual parameter name in the discovery doc, such as "filters", "uuid" etc. The request body parameter would be special-cased as "method_param.body".

The dot-separated prefix notation requires some extra support; fortunately it's fairly straightforward:

class NestedNamespace(argparse.Namespace):
    """Nestable namespace that support setting an attribute with dots in
    its name, so that argparse-based argument parser may store parsing
    results in nested namespaces.

    Example:

    parser = argparse.ArgumentParser()
    parser.add_argument("--foo-bar", dest="foo.bar")
    parser.parse_args(["--foo-bar", "spam"], namespace=ns)
    ns.foo.bar  # "spam" 
    """ 
    def __setattr__(self, name, value):
        attr_head, *attr_tail = name.split(".", 1)
        if not attr_head:
            raise AttributeError(f"invalid attribute: {name!r}")
        if not attr_tail or not attr_tail[0]:
            super().__setattr__(attr_head, value)
        else:
            try:
                next_ns = getattr(self, attr_head)
            except AttributeError:
                next_ns = type(self)()
            super().__setattr__(attr_head, next_ns)
            setattr(next_ns, attr_tail[0], value)

So we can write (pseudocode):

ns = NestedNamespace()
parser.parse_known_args(namespace=ns)
arv_resource = getattr(arv_client, resource_name)()
arv_object = getattr(arv_resource, ns.method)(**vars(ns.method_params)).execute()

How does this idea sound?

Actions

Copy link

Updated by Zoë Ma 25 days ago

A separate concern is the generation of YAML output. I tried PyYAML but kept running into this well-known issue with dumping multiline string: https://github.com/yaml/pyyaml/issues/240. This affects, for example, how manifest text with mandatory newlines would appear in the YAML output. I'm willing to pull in the ruamel.yaml dependency because it dumps the string with escape sequence "\n" which is closer to the Ruby arv output. With ruamel.yaml the output differs from the one generated by the Ruby arv mostly in the explicit @null@s

--- arv.yaml    2026-02-27 16:03:50.000000000 +0800
+++ ruamel.yaml.yaml    2026-02-27 21:08:53.000000000 +0800
@@ -1,14 +1,13 @@
----
 created_at: '2026-02-27T07:23:38.815722000Z'
 current_version_uuid: x000f-4zz18-7r04c9o42lu8zcf
-delete_at: 
-description: 
-etag: 3j41l7mxf0wdch4epwzm5gdfq
+delete_at: null
+description: null
+etag: 5oh0d1krsvm5d6o3eeghuujfc
 file_count: 1
 file_size_total: 0
 is_trashed: false
 kind: arvados#collection
-manifest_text: ". d41d8cd98f00b204e9800998ecf8427e+0+A7aea8fcb82a037eb128a69a0ec8cfaeccc07d640@69a2a166
+manifest_text: ". d41d8cd98f00b204e9800998ecf8427e+0+Ada435ff302b7d329d84b83ecbd4adcc4d264cf04@69a2e8e5
   0:0:empty\n" 
 modified_at: '2026-02-27T07:23:38.813294000Z'
 modified_by_user_uuid: x000f-tpzed-diyq2cc02bg3cuv
@@ -17,14 +16,14 @@
 portable_data_hash: 988c44767737c1c5d02ba76fb981e48a+47
 preserve_version: false
 properties: {}
-replication_confirmed: 
-replication_confirmed_at: 
-replication_desired: 
+replication_confirmed: 0
+replication_confirmed_at: '2026-02-27T03:04:40.337716000Z'
+replication_desired: null
 storage_classes_confirmed: []
-storage_classes_confirmed_at: 
+storage_classes_confirmed_at: '2026-02-27T03:04:40.337716000Z'
 storage_classes_desired:
 - default
-trash_at: 
+trash_at: null
 unsigned_manifest_text: ''
 uuid: x000f-4zz18-7r04c9o42lu8zcf
 version: 1

Actions

Copy link

Updated by Brett Smith 20 days ago

Target version changed from Development 2026-03-04 to Development 2026-03-18

Actions

Copy link

Updated by Brett Smith 11 days ago

Consider: The dispatch function is already a little hairy, and there's still more to add once we get to the create/edit subcommands. The implementation also has to be kept in very close sync with the argument parser: a change in one but not the other can lead to bugs.

One idea: Argparser subcommands can declare different "default" values, and those values don't have to be exposed through command line arguments. We can use this to declare where the subcommand should dispatch, as well as provide additional arguments from the subcommand. For example, consider a function like this that imports a module and calls its main method:

def call_other_main(args, remaining_args, api_client):
    main_mod = importlib.import_module(args.main_module)
    return main_mod.main(remaining_args)

Then subparsers can say they are implemented by this function like this:

ws_parser = subparsers.add_parser("ws")
ws_parser.set_defaults(
    main_func=call_other_main,
    main_module='arvados.commands.ws',
)
copy_parser = subparsers.add_parser("copy")
copy_parser.set_defaults(
    main_func=call_other_main,
    main_module='arvados.commands.arv_copy',
)

With this, the dispatch function can become much simpler:

def dispatch(arguments=None):
    api_client = arvados.api("v1")
    cmd_parser = ArvCLIArgumentParser(api_client._resourceDesc["resources"])
    args, remaining_args = cmd_parser.parse_known_args(arguments)
    sys.exit(args.main_func(args, remaining_args, api_client))

If you build this, then the API subcommands can declare a completely different main_func named something like call_api_method that dynamically builds and executes the appropriate API method from args.subcommand and args.method. And this addresses the issue at the top of the comment: now instead of having implementation split between the argument parser and the dispatch method, everything is declared in the argument parser, where it's easier to maintain.

Actions

Copy link

Updated by Brett Smith 11 days ago

Zoë Ma wrote in #note-5:

I think the main idea is to do what's described here in the docs.

Correct.

How does this idea sound?

This is a fine solution but I think there's a simpler option that requires less code. When it's time to make the API call, there will basically be two sets of values in the args namespace: a static set that define which method is being called, and then everything else which is a method parameter. If we define the first set, you can build method_args by creating a dict of all values in the second set. Something like this:

class ArvCLIArgumentParser(argparse.ArgumentParser):
    # If you want you could build this set dynamically in __init__.
    # You could even override add_argument to do it dynamically when called.
    _GLOBAL_ARGS = {
        'dry_run',
        'verbose',
        'format',
        'short',
        'subcommand',
        'method',
        'main_func',
        ...,
    }

def call_api_method(args, remaining_args, api_client):
    arv_resource = getattr(api_client, args.subcommand)()
    arv_method = getattr(arv_resource, args.method)
    method_args = {
        key: val
        for key, val in args.__dict__.items()
        if key not in ArvCLIArgumentParser._GLOBAL_ARGS
    }
    result = arv_method(**method_args).execute()
    # Output result according to args.format, with error handling

Make sense?

Actions

Copy link

#10

Updated by Brett Smith 11 days ago

Zoë Ma wrote in #note-6:

A separate concern is the generation of YAML output. I tried PyYAML but kept running into this well-known issue with dumping multiline string: https://github.com/yaml/pyyaml/issues/240. This affects, for example, how manifest text with mandatory newlines would appear in the YAML output. I'm willing to pull in the ruamel.yaml dependency because it dumps the string with escape sequence "\n" which is closer to the Ruby arv output. With ruamel.yaml the output differs from the one generated by the Ruby arv mostly in the explicit @null@s

Please use ruamel.yaml. arvados-cwl-runner already depends on this, and I would rather have a single YAML dependency on user systems than two. (The tests depend on PyYAML, but we don't have to install that on user systems, so it's less important.)

The explicit nulls are not a problem. The Python output does not have to match the style of the Ruby arv, only the semantics. It is okay for the YAML to be formatted a little differently.

Actions

Copy link

#11

Updated by Brett Smith 8 days ago

Brett Smith wrote in #note-9:

This is a fine solution but I think there's a simpler option that requires less code.

If you don't like this version, I thought of another option that's more like your original suggestion but simpler: just write an argparse Action that stores the value in a static dict.

class StoreMethodArg(argpargse.Action):
    def __call__(self, parser, namespace, values, option_string=None):
        # If the argument takes no values (in other words, it's a flag),
        # get the value to set from `const`.
        namespace.method_args[self.dest] = values or self.const

To use this:

The API method parsers, or a common parent, will need to say parser.set_default(method_args={}) to initialize the dict. I think it would be okay to do this on the top-level parser, as long as it's clear enough that non-API subcommands won't use this argument.
All API method arguments (e.g., the ones parsed from the discovery document) must use action=StoreMethodArg. Arguments that are already using a store action like store_true or store_false must pass the value they want to store as the const for that argument.

This means that argument parsing gives you a single dictionary with all the arguments to pass, as in your original suggestion, but it doesn't require customizing the behavior of Namespace which could have farther-reaching consequences.

Actions

Copy link

#12

Updated by Zoë Ma 7 days ago

Here, we have three proposals (subclassing Namespace from me, explicit enumeration of global args, or subclassing Action, both from you). I feel that explicit enumeration would be the easiest to explain, and therefore the easiest on our future selves and allies, as long as we don't re-use any of the "global" argument names for sub- (and sub-sub-!) commands (which I mean, for example, " arvcli.py --format json subcommand method --foo-arg=bar --format=pretty " or so on). If we're confident that's not going to happen in the future, it'll be a strong plus for "explicit enumeration".

In addition, we have mostly just two classes of subcommands with arguments for now: API commands and external commands. When we add more kinds of subcommands, do you think the simplicity of the approach will still hold?

Actions

Copy link

#13

Updated by Brett Smith 7 days ago

Zoë Ma wrote in #note-12:

I feel that explicit enumeration would be the easiest to explain, and therefore the easiest on our future selves and allies, as long as we don't re-use any of the "global" argument names for sub- (and sub-sub-!) commands (which I mean, for example, " arvcli.py --format json subcommand method --foo-arg=bar --format=pretty " or so on). If we're confident that's not going to happen in the future, it'll be a strong plus for "explicit enumeration".

I don't know if I can promise we'll never use these names in the API, but the thing is, if we do, we're gonna have to think about how to port them to the arv CLI anyway. Like, imagine we add a format argument: just the fact that the code now wants to have two switches named "format" will immediately become a problem, and we'll need to make some code change to deal with it. But, that's a problem to solve when we get to that point with the API. I don't think we need to solve it today. And I think all that's true no matter which implementation we choose. It's not clear to me that any of them are more "future-proof" than the others given the other changes that will be required.

In addition, we have mostly just two classes of subcommands with arguments for now: API commands and external commands. When we add more kinds of subcommands, do you think the simplicity of the approach will still hold?

I think so. Calling API methods is the only subcommand that deals with dynamic arguments. Everything else deals with a static known set. I can't imagine a way our choice here would realistically affect the implementation of the other subcommands we have today.

Actions

Copy link

Also available in: Atom PDF

	Related to Arvados Epics - Idea #22390: Add arv CLI implementation to the Python SDK	In Progress					Actions
	Follows Arvados - Feature #23330: arvcli has subcommands for Arvados API calls	Resolved	Zoë Ma	03/17/2026			Actions

Project

General

Profile

Arvados

Custom queries

Feature #23409

arvcli actually executes API subcommands

Updated by Brett Smith about 2 months ago

Updated by Brett Smith about 2 months ago

Updated by Brett Smith about 2 months ago

Updated by Brett Smith about 1 month ago

Updated by Zoë Ma 27 days ago

Updated by Zoë Ma 25 days ago

Updated by Brett Smith 20 days ago

Updated by Brett Smith 11 days ago

Updated by Brett Smith 11 days ago

Updated by Brett Smith 11 days ago

Updated by Brett Smith 8 days ago

Updated by Zoë Ma 7 days ago

Updated by Brett Smith 7 days ago