Feature #23330: arvcli has subcommands for Arvados API calls - Arvados

Actions

Copy link

Feature #23330

closed

arvcli has subcommands for Arvados API calls

Added by Brett Smith 4 months ago. Updated 7 days ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Zoë Ma

Category:

SDKs

Target version:

Development 2026-03-18

Story points:

Description

arvcli needs to support dynamic subcommands of the form:

arv RESOURCE METHOD [--param=value …]

The list of valid resources and methods comes from the Arvados discovery document. After you construct a client object, the information is available starting at arv_client._resourceDesc['resources']. You can see a JSON version of _resourceDesc at source:sdk/python/arvados-v1-discovery.json.

Resources are singular. Be careful of special cases: the singular of vocabularies is vocabulary, and the singular of sys is sys. For backwards compatibility with the Ruby implementation, we need to accept sy as a valid alias of sys.

Methods are listed under each resource. Parameters and their types are listed under their method. Values should be parsed such that the argument string is taken as JSON, and then we confirm that the resulting value has the expected type documented in the parameter.

When invoked this way, arvcli simply calls the requested API method with the given parameters, and writes the result in the requested format.

You can see an example of other Python parsing the resource document this way in source:sdk/python/discovery2pydoc.py, particularly in the Parameter and Method classes. This code is turning the data into Python method signatures, where you'll be turning them into argparse arguments. But this code shows you what keys have what information.

There is also some resource-parsing code in source:contrib/arvados-bootstrap/src/arv_bootstrap/seed.py, particularly the ArvadosResources class. If any of that is useful to you, feel free to propose moving it into the SDK. It would help to have a written proposal before you implement anything. If we add such code to the SDK, it should be public and generally useful, and we should take care to design an API that other people can use.

Subtasks 1 (0 open — 1 closed)

Related issues 3 (2 open — 1 closed)

Actions

Copy link

Updated by Brett Smith 4 months ago

Related to Idea #22390: Add arv CLI implementation to the Python SDK added

Actions

Copy link

Updated by Brett Smith 4 months ago

Subtask #23336 added

Actions

Copy link

Updated by Brett Smith 3 months ago

Target version changed from Development 2025-12-10 to Development 2026-01-06

Actions

Copy link

Updated by Zoë Ma 3 months ago

Note that this branch is not marked as "wip", as in, it can run and pass tests. However, only the argument parsing logic is implemented. The actual API calls are not yet made in the handler. Before I do that, I would love to receive further feedback to see if I am in the right direction.

23330-arv-subcommands-for-api-calls @ 07299375829b6f6c5a203b8f18c0ffd6106bb49b

developer-run-tests: #4968

All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- The command-line parsing tasks pertaining to resource-subcommands and their methods (typically a verb-like term like "get", "list", "create", etc.) are implemented following a pattern similar to the existing pass-through subcommands such as "keep". Namely, the "main" parser, which knows about subcommands and the methods but not the parameters (the double-dashed CLI flags such as "--uuid=..."). The further parsing of the parameters are done in a second stage, where an argument parser is constructed from the resource-method combo's parameter schema as defined in the discover doc.
Anything not implemented (discovered or discussed during work) has a follow-up story.
- Actual API calls are not yet made.
Code is tested and passing, both automated and manual, what manual testing was done is described.
- New unittests added for small utility functions as well as a simulated invocation of the script (but as mentioned, currently the handler can at most emit a help message). Manual testing involves running the arvcli.py script and observe the behavior, and learning about how the help messages emitted are affected by coding choices (but those messages are not yet polished).
The tested code incorporates recent main branch changes.
- Yes
New or changed UI/UX has gotten feedback from stakeholders.
- N/A
Documentation has been updated.
- Currently only inline docstrings.
Behaves appropriately at the intended scale (describe intended scale).
- N/A
Considered backwards and forwards compatibility issues between client and server.
- The available resources/methods/parameters are obtained from the API client object which in turn is based on the server output. This has limitations (e.g. if server is not responsive, client will hang), but the automated nature of the client code should ensure that the client behaves in a way expected by the server.
Follows our coding standards and GUI style guidelines.
- Yes (PEP-8)

In the code, I've liberally added "NOTE:" comments where I explain some why things are done in a particular way. Those are the places where I would particularly appreciate any comments.

Actions

Copy link

Updated by Brett Smith 3 months ago

Let's make it so the goal of this branch is just to turn the discovery document into argument parsing. When an API method is called, the implementation can just be a stub like print("called API method") or similar. We'll do the actual implementation in a follow-up branch. I have ideas for how you can nicely organize the different implementations for wrapper commands vs. API methods vs. editor commands.

I understand that making API methods subparsers of the resource subparsers felt too complex. However, I believe it is necessary complexity: this is the model that makes the argument parsing closest to the model in the discovery document, and avoids problems like "option with the same name has different meanings for different methods." Please make that change.

The way we make boolean options makes for a confusing UI. For the case where default=true, passing --flag has action='store_false' and turns the option off, which I don't think is what users expect.

For the case without a default, using bool as a type does not provide results the user expects. The type gets passed the string argument that the user provided, and any nonempty string will return True. If the user uses this option with values like 0, false, or no, those will all be set True, which is not what they expect.

I think for boolean flags, we should provide two options: a --flag that sets it true and a --no-flag that sets it false. This UI is predictable and clear. A rough implementation looks like:

parser.add_argument(
    parameter_key_to_argument_name(parameter_key),
    action='store_true',
    dest=parameter_key,
    default=json.parse(parameter_schema.get('default', 'null')),
)
parser.add_argument(
    parameter_key_to_argument_name(f'no_{parameter_key}'),
    action='store_false',
    dest=parameter_key,
)

I'll leave it up to you to figure out how to structure that in your own code.

Our style guide says to follow PEP 257 for docstrings. Please do so here. In particular, the docstring should start with one physical line that briefly summarizes the function, then a blank line, then more details. So, resource_subcommand_handler_stub is good. The other helper functions that just have one paragraph should follow its example. I care less about this for test docstrings, where the docstring usually documents what is being tested rather than functionality.

For related groups of functions, consider organizing them into a class to help future readers see their relationship. Up to you whether they go into the existing ArgumentParser class or a helper class. It's okay if they are all classmethod or staticmethod; this is just an organization suggestion. I also think it's less pressing for "pure" utility functions that stand on their own like singularize_resource. This is not required to pass review, just a suggestion.

Actions

Copy link

Updated by Zoë Ma 3 months ago

Based on the feedback, I created a new branch "23330-WIP-arv-api-calls-resource-subcommands-with-method-subsubcommands" at d26918eb5e7555d3b193f3abbebe606198367e91

(A test is running developer-run-tests: #4989 -- but it will fail; see below).

In this new branch, the "two-stage parsers" approach for resource subcommands is replaced by the "sub-subparser" approach.

One unexpected new finding is that now, if help is requested for a subcommand-method combo, such as arvcli.py user list -h, the description of this method is no longer printed in the help message, even if I tried to do the following

method_parser = method_subparsers.add_parser(method, help=method_schema.get("description"))

The test for printing the method description is the one that will fail.
"
What I tried is to see if this is somehow caused by the fact that we passed an "ArgumentParser with help-printing turned off" class as the class for the first-level subparsers (for the pass-through commands)

subparsers = self.add_subparsers(                                       
    dest="subcommand",                                                  
    help="Subcommands",                                                 
    required=True,                                                      
    parser_class=functools.partial(                                     
        argparse.ArgumentParser,                                        
        add_help=False                                                  
    )
)

But changing the first-level parser_class to the plain argparse.ArgumentParser results in the same behavior.

Although the old arv command doesn't output the method description either (see e.g. arv user list -h), it feels a bit strange that this is not handled better by the Python standard argparse. But I'm willing to skip this part if it's not something we can fix easily.

For other issues from the feedback -

Boolean flags: this is still WIP, because I'd appreciate some further discussion. Currently, in the real discovery document, we only have "type": "boolean" parameters with both "required": false and "default": "false". (see e.g. jq '.resources[].methods[].parameters[] | select(.type=="boolean")' < arvados-v1-discovery.json; if we filter by select(.type=="boolean" and .default=="true") there's none).
- if there will be no "default": "true" parameters in the future, I think the "--foo/--no-foo" approach will be clear; but if there could be default-true parameter and its name starts with "no-", it can get confusing. As long as we promise it won't happen (which I think we should?) -- I think the current plan will work.
Should we keep the current way of having a subclass of ArgumentParser, or have a class that processes the discovery document and has a method returning an argument parser instance when called? I can go with the former (current approach) but I suspect the latter could be clearer for future maintenance (I may be wrong though, so please help me think this through).
PEP-257: Yes, did that.

Actions

Copy link

Updated by Brett Smith 3 months ago

Status changed from New to In Progress

Actions

Copy link

Updated by Brett Smith 3 months ago

Target version changed from Development 2026-01-06 to Development 2026-01-21

Actions

Copy link

Updated by Brett Smith 2 months ago

Target version changed from Development 2026-01-21 to Development 2026-02-04

Actions

Copy link

#10

Updated by Zoë Ma about 2 months ago

Another issue is that when the "method" parameter is missing, there's no help given for the resource subcommand. E.g. "arvcli.py user [-h]" didn't produce a helpful message, unlike how the arv does.

In 8e522819dc5ae9cfe5ce82a6d51f5bd282ffae89, I made a workaround by manually calling the print_help() method of the resource subparser. (Test: developer-run-tests: #5022 , failing as expected because of the aforementioned issue ("One unexpected new finding is that now, if help is requested for a subcommand-method combo, such as arvcli.py user list -h, the description of this method is no longer printed in the help message.")

Actions

Copy link

#11

Updated by Brett Smith about 2 months ago

On the whole I think the current state of the branch is a great step in the right direction. This is closer to what we need.

At a functional level: compatibility with the arv command's existing command line parsing is a requirement, so the tool can be a drop-in replacement for existing scripts.

Compatibility with the exact help output is not a requirement. Scripts shouldn't be looking at help. It's unfortunate that argparse doesn't display descriptions the way we want, but I don't think we need to jump through any hoops to try to address that. That can be an improvement we make after the tool ships.

Take a look at the output of arv container_request create:

Usage: arv container_request create [--parameters]

This method supports the following parameters:

  -s, --select=<s>                 An array of names of attributes to return in
                                 the response.
  -e, --ensure-unique-name         If the given name is already used by this
                                 owner, adjust the name to ensure uniqueness
                                 instead of returning an error.
  -c, --cluster-id=<s>             Cluster ID of a federated cluster where this
                                 object should be created.
  -o, --container-request=<s>    Either a string representing container_request
                                 as JSON or a filename from which to read
                                 container_request JSON (use '-' to read from
                                 stdin). This option must be specified.
  -h, --help                     Show this message

There are a couple things here that need to be implemented in arvcli for compatibility:

If the method takes a request body, we add a --resource-name option that takes JSON to pass as the request body (either string or filename or - for stdin). This needs to be added. In source:sdk/python/discovery2pydoc.py Parameter.from_request demonstrates how to parse this out.
Short option flags: for each option, the parser goes through the letters of the option name, finds the first one that has not already been used as a short flag, and assigns that as the short flag. e.g., --container-request became -o because -c was already assigned to --cluster-id.

As one implementation idea, instead of having parameters_schema_to_arguments return a dict, consider having it yield (args, kwargs), where args is a tuple of option strings and kwargs is the same parameter_kwargs as you have now. Then the caller could loop and call parser.add_argument(*args, **kwargs) to create options that have both short and long names.

if there could be default-true parameter and its name starts with "no-", it can get confusing. As long as we promise it won't happen (which I think we should?) -- I think the current plan will work.

I agree we can commit to this. Rather than having parameters that start with no, it would be better API design for us to not use that prefix and then give the parameter the default value we want.

Should we keep the current way of having a subclass of ArgumentParser, or have a class that processes the discovery document and has a method returning an argument parser instance when called?

That's really up to you. In our earlier discussions, I described how I like using classes to keep related functionality together and help future readers understand when different functions are related to each other. I think having the subclass gives us a nice place to keep doing that. e.g., parameters_schema_to_arguments could easily be made a private method of the subclass. I think we'll have more like that as we continue adding subcommands like arv create and arv edit.

But that's only my opinion. It is not required by our style guide or to pass review. If you find it easier to follow to just have a method and eschew the class structure, you can do that.

Actions

Copy link

#12

Updated by Zoë Ma about 2 months ago

As of 8d8d5474117add680b3031997a3adf142258bcf9 (Tests being run: developer-run-tests: #5023 ), the following changes have been implemented:

Resource-name option for methods with a "request" body
Short option forms
Converted the method-parameters-to-CLI generator, closely related to the main arg parser class, into a private static method.
Some relatively minor detail fixes, such as adding "This option must be specified." to the help message of all required=True options.

If these look okay, shall the next step be about implementing the actual API functions?

Actions

Copy link

#13

Updated by Zoë Ma about 2 months ago

(11e5341c4c5c0b0419ff5ae8b0811590c63be24d on branch 23330-WIP2-arv-api-calls-resource-subcommands-with-method-subsubcommands is now the rebased, up-to-date changes; test runs at run-tests-doc-pysdk-api-fuse: #1515 )

Actions

Copy link

#14

Updated by Brett Smith about 2 months ago

Precedes Feature #23409: arvcli actually executes API subcommands added

Actions

Copy link

#15

Updated by Brett Smith about 2 months ago

Target version changed from Development 2026-02-04 to Development 2026-02-18

Actions

Copy link

#16

Updated by Brett Smith about 2 months ago

Zoë Ma wrote in #note-13:

(11e5341c4c5c0b0419ff5ae8b0811590c63be24d on branch 23330-WIP2-arv-api-calls-resource-subcommands-with-method-subsubcommands is now the rebased, up-to-date changes; test runs at run-tests-doc-pysdk-api-fuse: #1515 )

When you handle the request body parameter, it would be good if the value in args was the JSON-parsed value. That way, if there is a problem with the user's JSON, argparse will report it right away as an argument error.
Fortunately, we already have arvados.commands._util.JSONArgument for this. I think the easiest way to use that is: rather than add this parameter to parameters_schema, move the request parameter handling after the parameters_schema loop. You can yield hardcoded kwargs with 'type': JSONArgument, an appropriate metavar, and other values taken from request_schema as needed.

I think we should make sure we limit argument_short_key to only be alphabetics. Consider this version instead:

for argument_short_key in argument_key:
    if argument_short_key not in argument_key_abbrevs and argument_short_key.isalpha():
        ...

In docstrings, please follow our established style for argument lists. See examples in `arvados/api.py` or `arvados/util.py`.
- Have a blank line between each list item.
- Do not indent list items more than the docstring itself (so they don't get parsed as preformatted code blocks).
- Do not mark up the name with backticks.
- Include a type annotation, followed by ---.
Please do not use a list to document a single return value. Just document the value in the paragraph. If the value is a tuple, a list that documents each element of the tuple may be appropriate.
In the tests, the comment "Bogus entries" is a little confusing because the word "bogus" makes me think it's an invalid input that is going to raise an exception or other error, but that's not the case. I think what you mean is just that these fields don't exist in the real discovery document. Consider replacing "bogus" with "fictional," "test-only," or something like that to help clarify.

Thanks.

Actions

Copy link

#17

Updated by Zoë Ma about 1 month ago

When you handle the request body parameter, it would be good if the value in args was the JSON-parsed value.

Should this also apply to ordinary parameters with "type": "array" or "type": "object"?

Actions

Copy link

#18

Updated by Brett Smith about 1 month ago

Zoë Ma wrote in #note-17:

When you handle the request body parameter, it would be good if the value in args was the JSON-parsed value.

Should this also apply to ordinary parameters with "type": "array" or "type": "object"?

Good catch, yes.

Actions

Copy link

#19

Updated by Zoë Ma about 1 month ago

(In this post the referenced commits are in chronological order.)

Brett Smith wrote in #note-16:

I think we should make sure we limit argument_short_key to only be alphabetics.

This is done in 0c123b821ca11d4e4240da28fec59977195a2c3d

In docstrings, please follow our established style for argument lists. See examples in `arvados/api.py` or `arvados/util.py`.

These are done in b973983e433424b2426a19992e24f7eb976a2270

In the tests, the comment "Bogus entries" is a little confusing because the word "bogus" makes me think it's an invalid input that is going to raise an exception or other error, but that's not the case. I think what you mean is just that these fields don't exist in the real discovery document. Consider replacing "bogus" with "fictional," "test-only," or something like that to help clarify.

This is done in 7304fda35589e70b4c3d3cf99a639a5d84980f76

In addition,

In ea8b79921dfd4ef92e57f46008d94afdbc5e14af, I put discover-doc-processing functions into an otherwise empty, namespace class as static methods, because I feel this is a better way to organize.
In 5a37c8e94c41774b19f59114a030d04c60c20e00 and 0a323ceea58b747c78803976621f0bcb28849495, I implemented validation and conversion for the "type": "array"/"object" JSON argument values (for string values to non-request parameters only, without considering file path)
- Metavars are "JSON_ARRAY" and "JSON_OBJECT" respectively
- Manual testing with CLI args like arvcli.py api_client_authorization create_system_auth --scopes bad_value shows that the resulting error message appears informative and friendly; this is achieved by a __repr__ hook to the type-converter/validator callable object (which is a bit hacky).

There's one thing that's not done yet:

When you handle the request body parameter, it would be good if the value in args was the JSON-parsed value. That way, if there is a problem with the user's JSON, argparse will report it right away as an argument error.
Fortunately, we already have arvados.commands._util.JSONArgument for this. I think the easiest way to use that is: rather than add this parameter to parameters_schema, move the request parameter handling after the parameters_schema loop. You can yield hardcoded kwargs with 'type': JSONArgument, an appropriate metavar, and other values taken from request_schema as needed.

I looked at the arvados.commands._util.JSONArgument source but I don't think it will do what we want. It doesn't cover the special usage of "-" for stdin. Also, in the current Ruby arv command, there's more logic about the ambiguous case where a file path may be valid JSON (arv will print a message and abort, asking the user to consider renaming the file). I think it'll be better to write an arv-compatible custom validator here.

There's also the corner case of having a file named "-", which is kinda annoying, and not handled by arv. In general, I don't particularly like the idea of having a CLI option that can take either a string or "dereference" a file's content for its string value. But this is what arv does and I think I'll focus on compatibility first.

So the immediate plan is to do that, and ignore the corner case of a file named "-" for now. Sounds okay? Thanks.

Actions

Copy link

#20

Updated by Brett Smith about 1 month ago

In the new test:

def test_cli_can_intercept_invalid_json_subtype(invalid_value):
    # --scope takes JSON array
    cli = "api_client_authorization create_system_auth --scope".split()
    cli.extend(invalid_value)
    with pytest.raises(SystemExit) as exit_status:
        arvcli.dispatch(cli)
    assert exit_status.value.code == 2

You should use cli.append instead of cli.extend. cli.extend appends each character of the invalid_value string to the list. e.g., in the first case where invalid_string == "foo", cli ultimately becomes ['api_client_authorization', 'create_system_auth', '--scope', 'f', 'o', 'o']. This is not what you mean to test and the extra arguments mean you might get a false positive.

(IMO this is an argument for just writing cli as a list literal rather than building it dynamically: yes it's more verbose, but your meaning is clearer and doesn't require following code. But that's purely a personal opinion, not a review comment.)

I looked at the arvados.commands._util.JSONArgument source but I don't think it will do what we want. It doesn't cover the special usage of "-" for stdin.

If this changes your thinking at all, I think it would be fine to add that to the existing class.

Also, in the current Ruby arv command, there's more logic about the ambiguous case where a file path may be valid JSON (arv will print a message and abort, asking the user to consider renaming the file).

So when we think about writing a backwards compatible replacement for arv, what we mean more specifically is that any arv commands that work with the current Ruby implementation must have the same semantics in the new Python implementation. Like, imagine arv commands in our documentation, or in scripts that admins have written: whatever that does in Ruby, it also needs to do in Python.

But that's all it means. We do not need byte-for-byte identical output, whether for help messages, error messages, or even just regular output. (e.g., if a command outputs JSON, it still needs to be valid JSON, but we don't need the exact same indentation, quoting style, etc.) If there are cases that don't work in the Ruby arv command, we can decide to make them do something meaningful in the Python verison, if that makes our lives easier for whatever reason like using an existing library.

There's also the corner case of having a file named "-", which is kinda annoying, and not handled by arv.

Lots of tools have this problem. The fact that - conventionally refers to stdin on the command line but is a valid filename is basically a design limitation of Unix. Fixing it is not in our control and we don't need to worry about it.

So the immediate plan is to do that, and ignore the corner case of a file named "-" for now. Sounds okay?

Yes please, thank you.

Actions

Copy link

#21

Updated by Zoë Ma about 1 month ago

Thanks, the JSON-or-file argument has been added as of d5994ede26efd1c912bd3d90fcfd85678b7155ec (which also fixes the bad CLI in the test).

At the moment, I'm rolling a custom type (_ArgTypes.JSONStringArg and friends), as I fear being distracted by tinkering further with _util.JSONArgument that is also used by arvados_fuse as well as arv_bootstrap in contrib. Inside the custom type, I also find it helpful to raise argparse.ArgumentTypeError(msg) which gives me explicit control of the error message generated upon invalid input, as opposed to the default custom type repr (by this I mean messages like "error: invalid <repr of type> value: ..." which can look out of place). I feel that this is better than overriding __repr__ of the custom type, which is a bit hacky.

Actions

Copy link

#22

Updated by Zoë Ma about 1 month ago

I also added a few tests for the custom type converters/validators, as well as how they integrate with the CLI parser, in sdk/python/tests/test_arvcli.py

Actions

Copy link

#23

Updated by Brett Smith about 1 month ago

So, I know it's frustrating that this ticket has lingered on. I apologize for that.

I know you want to get on with implementation, and that's fair, I appreciate that. I would encourage you to consider this is implementation: no matter what, you would need to parse the user's JSON in order to make the API request. It was admittedly only during review that I realized doing it in argument parsing would provide the best user experience.

In retrospect we maybe could've let this be a separate ticket: the "plain" argument parsing first, then the JSON parsing as a dedicated follow-up with more detailed planning. I'm sorry I didn't. Hindsight is 20/20. All that said, I think we're close.

At the moment, I'm rolling a custom type (_ArgTypes.JSONStringArg and friends), as I fear being distracted by tinkering further with _util.JSONArgument that is also used by arvados_fuse as well as arv_bootstrap in contrib. Inside the custom type, I also find it helpful to raise argparse.ArgumentTypeError(msg) which gives me explicit control of the error message generated upon invalid input

As the person who wrote that code, I think all the changes you're making are good changes. And I think it would be great for us to have all those improvements made to JSONArgument directly rather than limited to tool-specific subclasses.

The only valid argument you're changing the meaning of is -, which as we said, is inherently ambiguous in Unix. I think having it read from stdin is probably what most users expect over having it read a file named -.

On the error reporting, we already have a bug #21506 about how the generated error messages aren't helpful. It looks like you've figured out how to address that, and I would love to roll that into JSONArgument to give all the tools a better UX.

If you're still really not comfortable, I won't insist. But I think you've written all the code we need to improve multiple tools already, not just the new one; and I'm confident the changes don't raise any undue backwards compatibility concerns. It would be awesome to see the changes made in the right place for all that to happen. Let me know what you think.

Actions

Copy link

#24

Updated by Brett Smith about 1 month ago

Target version changed from Development 2026-02-18 to Development 2026-03-04

Actions

Copy link

#25

Updated by Zoë Ma about 1 month ago

Thanks, I'll see how to make the change to the current arvados.command._util.JSONArgument that works for the rest of the codebase with minimal touch.

Actions

Copy link

#26

Updated by Zoë Ma 29 days ago

23330-arv-api-calls-resource-subcommands-with-method-subsubcommands @ 55b7992a8cddc1a32cf93beb308e59b29f534f08

developer-run-tests: #5037

All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- This is a massive branch; the main difference from the WIP state at #note-23 is that the (re-)implementation of JSONArgument has been put in _util.py and not in the command script arvcli.py itself.
  As explained in the commit message of a9cd993b828cbcf089754a47ffc28aa93f0e75f3, in _utils.py we now have
  
  1. A JSONStringArgument class, with a __call__ method, that performs basic JSON-string conversion (useful on its own). Optional loading and post-loading-validation callbacks can be set when calling __init__.
  2. JSONArgument is implemented as a partial object with the callback for loading fixed to a specialized custom function json_or_file_loader, which implements the JSON-or-file loading logic and error messaging.
  3. For compatibility with the old JSONArgument class, there's still the freedom of choosing a post-loading validation function callback. This can be used in the usual manner, for example, to validate filters.
  
  As for the validator callback function, the expectation remains the same: we should use ValueError/TypeError to signal validation failure, and a normal return value for validation success. The message argument to ValueError/TypeError will be automatically added to the argparse-generated error message in the CLI. The callback writer can always choose to roll their own fine-grained messaging by raising argparse.ArgumentTypeError(msg) directly.
  
  This should also fix #21506, because we can now control the argparse error message explicitly.

Anything not implemented (discovered or discussed during work) has a follow-up story.
- Actual API requests will be the task of #23409, TBD.

Code is tested and passing, both automated and manual, what manual testing was done is described.
- Automated testing covers _util.py in addition to arvcli.py itself. The FUSE CLI tool also uses JSONArgument, and the services/fuse tests pass unmodified. Manual testing involves running the arvcli.py script with valid/invalid inputs and inspecting the output (so far for manual testing purposes only; no API requests are actually sent).

The tested code incorporates recent main branch changes.
- Yes

New or changed UI/UX has gotten feedback from stakeholders.
- The expected CLI behavior has been discussed throughout this thread.

Documentation has been updated.
- Docstring for JSONArgument and friends are revised to reflect the changes. The arvcli.py script is not yet well-documented, and this shall be done in the follow up task #23409

Behaves appropriately at the intended scale (describe intended scale).
- N/A?

Considered backwards and forwards compatibility issues between client and server.
- The client code is based on discovery doc from the API server; it actually embeds assumptions about the discovery doc will "look like" the current dev version's. we discussed some aspects of API design in #note-11

Follows our coding standards and GUI style guidelines.
- Yes

Actions

Copy link

#27

Updated by Zoë Ma 26 days ago

Pushed to 23330-arv-api-calls-resource-subcommands-with-method-subsubcommands @ e24473e0e3c31fbb54222f3bc64faf822ea414e7

Fix the argument type converter/validator for the request body parameter. (developer-run-tests: #5038 )

Actions

Copy link

#28

Updated by Brett Smith 22 days ago

Since there's been several pushes since the last comment, what would you like me to review and provide feedback on? I'm okay calling it wherever you want, I'm just not sure not what's ready for review vs. what's WIP. Please let me know.

Actions

Copy link

#29

Updated by Zoë Ma 22 days ago

I'd really appreciate more eyes on the arvados.cmd._util.JSONArgument new implementation, as well as how it impacts services/fuse/arvados_fuse/command.py, as well as the contrib package arv_bootstrap. Thank you!

Actions

Copy link

#30

Updated by Brett Smith 20 days ago

Target version changed from Development 2026-03-04 to Development 2026-03-18

Actions

Copy link

#31

Updated by Brett Smith 15 days ago

Zoë Ma wrote in #note-26:

23330-arv-api-calls-resource-subcommands-with-method-subsubcommands @ 55b7992a8cddc1a32cf93beb308e59b29f534f08

Thank you for modernizing the JSONArgument tests, that's a really nice improvement.

In json_or_file_loader, I think it would be better if the flow was:

First handle it as a path.
- If this raises OSError other than FileNotFoundError, report just that error immediately.
Then handle it as a JSON string.
Then proceed with the rest of the error handling.

The main reason for this is the OSError handling: if such an error occurs, it's almost certainly the problem the user wants to fix, and it would be good to report it more prominently. And then this simplifies the later error handling code.

I would like to propose two small patches:

diff --git c/sdk/python/arvados/commands/_util.py i/sdk/python/arvados/commands/_util.py
index 178cd2608a..0d1130ff7e 100644
--- c/sdk/python/arvados/commands/_util.py
+++ i/sdk/python/arvados/commands/_util.py
@@ -151,7 +151,7 @@ class JSONStringArgument:
     def __init__(
         self,
         validator: t.Optional[t.Callable[[t.Any], t.Any]] = None,
-        loader: t.Optional[t.Callable[[str], t.Any]] = None,
+        loader: t.Callable[[str], t.Any] = json.loads,
         pretty_name: str = "JSON" 
     ):
         """Keyword arguments:
@@ -178,8 +178,8 @@ class JSONStringArgument:
           human-readable name for the kind of value that the argument takes.
           Default: "JSON".
         """ 
-        self.loader = loader if callable(loader) else json.loads
-        self.post_validator = validator if callable(validator) else None
+        self.loader = loader
+        self.post_validator = validator
         self.pretty_name = pretty_name or "JSON" 

     def __call__(self, value: str):

The main reason for this is that the if callable() test has the potential to mask bugs. If I'm a developer and I pass the wrong type of argument into this function, that should crash to let me know there's a problem. The current version will silently ignore my bad value, which is harder to debug.

diff --git i/sdk/python/arvados/commands/_util.py w/sdk/python/arvados/commands/_util.py
index 0d1130ff7e..ba6ef67f0b 100644
--- i/sdk/python/arvados/commands/_util.py
+++ w/sdk/python/arvados/commands/_util.py
@@ -183,24 +183,19 @@ class JSONStringArgument:
         self.pretty_name = pretty_name or "JSON" 

     def __call__(self, value: str):
-        is_ok = True
-        callback_exc_msg = "" 
+        failure = None
         try:
             retval = self.loader(value)
         except ValueError as err:  # This covers json.JSONDecodeError too.
-            is_ok = False
-            calback_exc_msg = str(err)
+            failure = err
         else:
             if self.post_validator is not None:
                 try:
                     retval = self.post_validator(retval)
                 except (ValueError, TypeError) as err:
-                    is_ok = False
-                    callback_exc_msg = str(err)
-        if not is_ok:
-            msg = f"{value!r} is not valid {self.pretty_name}." 
-            if callback_exc_msg:
-                msg += f" Further info: {callback_exc_msg}" 
+                    failure = err
+        if failure is not None:
+            msg = f"{value!r} is not valid {self.pretty_name}: {failure}" 
             raise argparse.ArgumentTypeError(msg) from None
         return retval

In general it's good practice to avoid having separate variables that have to be maintained in sync, because it's a common source of bugs when they accidentally get out of sync. Just tracking the exception we care about is cleaner than tracking a flag+error message string.

IMO the "Further info:" string does not give the user any additional information, and just adding those details after a colon better follows CLI idioms. If you're okay with it, this change will require updating some test assertions.

There are places where json_or_file_loader could be improved similarly, but that's not critical.

Thanks.

Actions

Copy link

#32

Updated by Zoë Ma 14 days ago

Thanks for your detailed review!

In json_or_file_loader, I think it would be better if the flow was:

First handle it as a path.

If this raises OSError other than FileNotFoundError, report just that error immediately.

Then handle it as a JSON string.

Then proceed with the rest of the error handling.

The main reason for this is the OSError handling: if such an error occurs, it's almost certainly the problem the user wants to fix, and it would be good to report it more prominently. And then this simplifies the later error handling code.

I have a question about this sequence of attempts. What if there's a file name that is valid JSON (most likely numbers, and less likely special keywords like null or string literals like "foo"), but the file, though exists, is inaccessible (causing OSError)? This is of course a corner case, but in that case, I guess it would be futile to guess the user's intent. With your proposal, the result would be early failure due to OSError. Is there a case for it to proceed in spite of the OSError, and use the input value as JSON (as my implementation currently does)?

(This is currently the test case in the test-routine method test_path_resembles_json_but_is_not_readable_file() in tests/test_cmd_util.py. I left a note that read 'TODO: consider emitting a warning?' in that function, hence this question.)

I would like to propose two small patches: [...]

These are now incorporated in 5d95f5195f7cac7f4814f95af550b7b116a5bebc (developer-run-tests: #5043 ), with a small difference as follows:

In the (admittedly uncommon) situation where failure is not None (i.e. when the validation fails, raising TypeError/ValueError) but str(failure) is empty (i.e. somehow there's no actual further details available), the part in the message including the trailing colon, ": {failure}" will not be printed at all.

Actions

Copy link

#33

Updated by Brett Smith 14 days ago

Zoë Ma wrote in #note-32:

I have a question about this sequence of attempts. What if there's a file name that is valid JSON (most likely numbers, and less likely special keywords like null or string literals like "foo"), but the file, though exists, is inaccessible (causing OSError)? This is of course a corner case, but in that case, I guess it would be futile to guess the user's intent. With your proposal, the result would be early failure due to OSError. Is there a case for it to proceed in spite of the OSError, and use the input value as JSON (as my implementation currently does)?

I don't think so. I think that violates the principle of least surprise. I feel similarly about this as I did about the is callable check earlier: If I have a file that contains JSON, and I pass that filename as an argument, I want the tool to read that file. If my intentions are thwarted because of a permissions error or whatever other I/O problem, I don't want the tool to suddenly change logic and decide to parse my filename as a JSON literal, or whatever. It should just report the problem and stop. Doing otherwise has the potential to cause unintended behavior and mask problems.

In general, I think failing is better than doing something the user didn't expect.

Actions

Copy link

#34

Updated by Zoë Ma 12 days ago

9a0e201e4fd07aa49994dc047bd0e227bd6022c7 (developer-run-tests: #5044 )

Thanks, I implemented most of your suggestions to json_or_file_loader().

For user-facing error messaging, the majority of them follow the "Message: details" format, except in one case. Namely, when encountering a FileNotFoundError during @open()@ing and the string input itself fails to parse as JSON, there's some inherent unwieldiness in telling what goes wrong. It could be moved/mistyped path, or it could be malformed JSON. So I'm opting for the following format:

'{input}' is not a readable file or valid JSON [JSON decoding error: {details}]

The rationale is roughly as follows

When the intention is to give it a file path, the message shows up like
```
'/path/to/foo' is not a readable file or valid JSON [JSON decoding error: Expecting value: line 1 column 1 (char 0)]
```
and at least the first part before the word "or" is a self-contained message that is likely helpful. The cryptic JSON-decoding error details, not relevant to a file-path issue, are there, but in brackets. I feel this is better than putting it after a colon, which may inadvertently divert more attention to it than is appropriate.
When the intention is to give it a string as JSON, the message shows up like
```
'["foo", "bar",]' is not a readable file or valid JSON [JSON decoding error: Expecting value: line 1 column 15 (char 14)]
```
In this case the user can still get the debug message they may need to fix the error, albeit there's a little distraction about it being not a file in the beginning of the message.

Actions

Copy link

#35

Updated by Brett Smith 11 days ago

Zoë Ma wrote in #note-34:

9a0e201e4fd07aa49994dc047bd0e227bd6022c7 (developer-run-tests: #5044 )

Checked and confirmed the failures are #23403, not related.

Thanks, I implemented most of your suggestions to json_or_file_loader().

This looks pretty good, thanks. I would like it if, when you catch OSError, we convert that into the right type of argparse exception. This version crashes with a traceback, and for sure we have a lot of that in our CLI tools already, but I really appreciate you found a way to improve this. So it would be good to improve this too.

That's my only comment, if you're cool making that change (or have a reason why you'd rather not) I expect to merge. Thanks!

Actions

Copy link

#36

Updated by Zoë Ma 7 days ago

Thanks, these are done in 9607ca26e235180d085edc056be57433fb05e848 (developer-run-tests: #5051 )

Actions

Copy link

#37

Updated by Brett Smith 7 days ago

Status changed from In Progress to Resolved

Applied in changeset arvados|d9671c808bd0c97d3eb41caed3cc91b0eff64a4c.

Actions

Copy link

#38

Updated by Brett Smith 6 days ago

Related to Idea #21506: Improve error message when arv-mount --filters file is bad JSON added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Feature #23330

arvcli has subcommands for Arvados API calls

Updated by Brett Smith 4 months ago

Updated by Brett Smith 4 months ago

Updated by Brett Smith 3 months ago

Updated by Zoë Ma 3 months ago

Updated by Brett Smith 3 months ago

Updated by Zoë Ma 3 months ago

Updated by Brett Smith 3 months ago

Updated by Brett Smith 3 months ago

Updated by Brett Smith 2 months ago

Updated by Zoë Ma about 2 months ago

Updated by Brett Smith about 2 months ago

Updated by Zoë Ma about 2 months ago

Updated by Zoë Ma about 2 months ago

Updated by Brett Smith about 2 months ago

Updated by Brett Smith about 2 months ago

Updated by Brett Smith about 2 months ago

Updated by Zoë Ma about 1 month ago

Updated by Brett Smith about 1 month ago

Updated by Zoë Ma about 1 month ago

Updated by Brett Smith about 1 month ago

Updated by Zoë Ma about 1 month ago

Updated by Zoë Ma about 1 month ago

Updated by Brett Smith about 1 month ago

Updated by Brett Smith about 1 month ago

Updated by Zoë Ma about 1 month ago

Updated by Zoë Ma 29 days ago

Updated by Zoë Ma 26 days ago

Updated by Brett Smith 22 days ago

Updated by Zoë Ma 22 days ago

Updated by Brett Smith 20 days ago

Updated by Brett Smith 15 days ago

Updated by Zoë Ma 14 days ago

Updated by Brett Smith 14 days ago

Updated by Zoë Ma 12 days ago

Updated by Brett Smith 11 days ago

Updated by Zoë Ma 7 days ago

Updated by Brett Smith 7 days ago

Updated by Brett Smith 6 days ago