Project

General

Profile

Actions

Feature #23293

closed

Start new Python arv with passthrough subcommands

Added by Brett Smith 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
-
Release relationship:
Auto

Description

This is part of #22390.

Add a script under sdk/python/arvados/commands that implements an argument parser with at least one of the subcommand wrappers copy, ws, or keep.

  • The subcommand should consume all remaining arguments with nargs=argparse.REMAINDER.
  • For each subcommand, the code should set sys.argv = [SCRIPTNAME, *arguments] and then call the same main function that the script under sdk/python/bin does:
    • copy calls arvados.commands.arv_copy.main()
    • arv keep get calls arvados.commands.get.main(sys.argv[1:])
    • arv keep docker calls arvados.commands.keepdocker.main()
    • arv keep ls calls arvados.commands.ls.main(sys.argv[1:], sys.stdout, sys.stderr)
    • arv keep put calls arvados.commands.put.main()
    • ws calls arvados.commands.ws.main()
  • To help provide a good tool startup time, this new script should not import any of the passthrough subcommand modules until it is actually necessary to run one.

When you start this script, be mindful of this requirement from the parent ticket:

ultimately, the script must have a function that can run without arguments to start the tool. (It can accept arguments to help with testing but those arguments all have to have default values that make sense for starting the script.) When the tool is ready for interactive testing but not a full replacement, it should be possible to run some more verbose way like python3 arvados/commands/arvcli.py.

You do not need to implement an entry point function as part of this ticket, but be mindful of the fact that it will be required later. e.g., do not write any significant functionality at the module level outside of any function or class.

This is not required, but consider writing your argument parser as a new class ArgumentParser(argparse.ArgumentParser). Because the arv command will eventually do a lot of dynamic generation of subcommands and arguments, starting a dedicated class now gives you a natural place to add that required functionality in future tickets. This class can automatically build the parser as part of its __init__ method, which can call helper functions for repeated functionality.


Subtasks 1 (0 open1 closed)

Task #23300: Review 23293-new-arv-in-python-with-passthrough-subcommandsResolvedBrett Smith11/24/2025Actions

Related issues 1 (1 open0 closed)

Related to Arvados Epics - Idea #22390: Add arv CLI implementation to the Python SDKIn ProgressActions
Actions #1

Updated by Brett Smith 5 months ago

  • Related to Idea #22390: Add arv CLI implementation to the Python SDK added
Actions #2

Updated by Brett Smith 5 months ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith 4 months ago

  • Category set to SDKs
Actions #4

Updated by Brett Smith 4 months ago

  • Status changed from New to In Progress
Actions #5

Updated by Brett Smith 4 months ago

  • Target version changed from Development 2025-11-12 to Development 2025-11-26
Actions #6

Updated by Brett Smith 4 months ago

  • Subtask #23300 added
Actions #7

Updated by Zoë Ma 4 months ago

In branch 23293-wip-new-arv-in-python-with-passthrough-subcommands @ d60382ec6d37aa8142dcf71ff2a6d84bdaea8436, I added (undocumented, untested) "arvcli.py" script with the skeleton of an argument parser subclass that as the initial shape of utilizing subcommands and their subparsers.

The implementation differs somewhat from the description in the main body of this issue because the suggested mechanism of passing the "rest of the arguments" (argparse.REMAINDER) is no longer recommended and has been removed from the docs. Instead I'm getting the rest of the arguments from the return value of ArgumentParser.parse_known_args(). I hope this will be easier to maintain in the future.

I also touched keepdocker.py and ls.py just to harmonize the call signatures of their main() entry points, so that I don't have to call them in different ways.

I have a few questions as follows, on top of any that you will have --

  1. Is it correct to say that the argument parser I am implementing will ultimately cannibalize the roles of the various individual arg parser instances currently in the various scripts under sdk/python/bin ((put.py, ls.py, etc.), so that those scripts will no longer have any argparsing in them? Or should it be the other way around -- the individual executable scripts import the arg parser subclass from arvcli.py (instead of directly from the stdlib base class), then instantiate it with their own additions (especially help messages), and remain executable scripts?
  1. For now, what shall the tests for this arg parser class look like? Honestly I feel that there isn't much to test.
Actions #8

Updated by Brett Smith 4 months ago

Zoë Ma wrote in #note-7:

The implementation differs somewhat from the description in the main body of this issue because the suggested mechanism of passing the "rest of the arguments" (argparse.REMAINDER) is no longer recommended and has been removed from the docs. Instead I'm getting the rest of the arguments from the return value of ArgumentParser.parse_known_args(). I hope this will be easier to maintain in the future.

Good catch, I wasn't aware of that (I wish the docs were a little more explicit that it's kinda deprecated). This approach is fine, but note that later on when we add our own subcommands like arv edit and the API wrappers, having any unrecognized arguments should raise an argument parsing error.

I also touched keepdocker.py and ls.py just to harmonize the call signatures of their main() entry points, so that I don't have to call them in different ways.

I had this thought too, I didn't want to require it for the ticket but it's a great change. Thank you.

  1. Is it correct to say that the argument parser I am implementing will ultimately cannibalize the roles of the various individual arg parser instances currently in the various scripts under sdk/python/bin ((put.py, ls.py, etc.), so that those scripts will no longer have any argparsing in them? Or should it be the other way around -- the individual executable scripts import the arg parser subclass from arvcli.py (instead of directly from the stdlib base class), then instantiate it with their own additions (especially help messages), and remain executable scripts?

Neither? I don't believe any change should be necessary for the existing argument parsers, and I since we're using parse_known_arguments and just passing things through, I don't see anything we would need to DRY up to avoid repetition in the codebase. Can you say more about what you're concerned about?

  1. For now, what shall the tests for this arg parser class look like? Honestly I feel that there isn't much to test.

We should at least test that dispatch passes through arbitrary options correctly. Testing that for each of these subcommands would be ideal. A single parametrized test method that mocks the corresponding tool's main method should suffice.

Actions #9

Updated by Zoë Ma 4 months ago

Thanks for the feedback Brett.

I gave the issue as a whole more thought and I realized that this issue itself is more about making the transition from mixed-Ruby-Python to just-Python, and less about making changes to existing Python client scripts (the put.py, ls.py etc.). So for this

I don't believe any change should be necessary for the existing argument parsers, and I since we're using parse_known_arguments and just passing things through, I don't see anything we would need to DRY up to avoid repetition in the codebase.

my understanding is that the main parser, for now, in addition to being flexible enough for further extension benefiting the subcommands yet to be converted, just needs to know enough to do the dispatching. Right?

There's a complication about help messages. If, say, we keep the current help info for subcommands inside their own scripts, an invocation like python3 arvcli.py keep docker -h will never cause the detailed help messages (which is implemented inside keepdocker.py) to be printed. Rather, the parser in arvcli.py will emit the help message it has assembled, and exit before calling the main() function of the underlying script.

Am I right in saying that it's not desirable to move the help messages into arvcli.py? I think they should stay close to where their code implementations lie. So, let me see if there's an easy workaround for this from the main parser's side (i.e. not exiting if "-h" is given on the cmdline for the pass-through subcommands)

Actions #10

Updated by Brett Smith 4 months ago

Zoë Ma wrote in #note-9:

my understanding is that the main parser, for now, in addition to being flexible enough for further extension benefiting the subcommands yet to be converted, just needs to know enough to do the dispatching. Right?

That's right.

Am I right in saying that it's not desirable to move the help messages into arvcli.py? I think they should stay close to where their code implementations lie. So, let me see if there's an easy workaround for this from the main parser's side (i.e. not exiting if "-h" is given on the cmdline for the pass-through subcommands)

To start, ArgumentParser takes an add_help option that you can set False to prevent adding the option at all. I think that's all that needs to be done for this initial ticket. Later on we will need to implement --help for arv generally, but that can be a follow-up ticket to add our own option to do so.

Actions #11

Updated by Zoë Ma 4 months ago

23293-new-arv-in-python-with-passthrough-subcommands @ f14f6b54d88bf3878bafcdefa25c5fd8c77ba27e
developer-run-tests: #4945

  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • The changes include
      1. Dispatching (with CLI arguments) to the subcommands keep ls|get|put|docker, ws, and copy.
      2. Entry point function dispatch()
      3. class ArvCLIArgumentParser(argparse.ArgumentParser) for command-line parsing
    • Differences from the main post are discussed in #note-7 (details about how to pass arbitrary CLI arguments to the pass-through tools).
    • Testing main and subcommand arguments with the following requirements
      1. arvcli.py -h produces the help message for main command
      2. arvcli subcommand -h produces the help message for the subcommand.
      3. Any additional arguments are passed to the subcommand's main function as they are.
    • Currently, for the pass-through subcommands, a subcommand's help is handled by the corresponding scripts' own argument parsers. For future expansion, the 'native' subcommands will need to print help and exit explicitly, too.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • The main command's help message (including heading for subcommands) hasn't been polished. The aesthetic/readability aspects of the largely auto-generated help messages are a bit complex and I intend to follow up by replicating the existing arv command's behavior as much as possible wherever the existing behavior is reasonable.
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • Tests (see #note-8) are automated, and manual tests are performed using the command [path/to/]python3 arvcli.py [...] to check that the script arvcli.py indeed invokes the underlying subcommand scripts (and, when called with [...] subcommand -h, produces the detailed subcommand help).
  • The tested code incorporates recent main branch changes.
    • Yes
  • New or changed UI/UX has gotten feedback from stakeholders.
    • N/A
  • Documentation has been updated.
    • Currently, only old behaviors are replicated, and the addition is an incomplete starting point, and documentation changes will be done gradually as arvcli.py become a full replacement of arv
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • (This is purely a client change)
  • Follows our coding standards and GUI style guidelines.
    • The new Python code (mostly) follows PEP-8 (checked by the 'pycodestyle' tool)
Actions #12

Updated by Brett Smith 4 months ago

Zoë Ma wrote in #note-11:

23293-new-arv-in-python-with-passthrough-subcommands @ f14f6b54d88bf3878bafcdefa25c5fd8c77ba27e
developer-run-tests: #4945

This is a very solid start. I really appreciate all the different cases covered by the tests; thank you for putting that effort in. Here are things that I think would be good to address here:

  • At the end of dispatch, rather than catching NameError, I think it would be easier to follow and be a more targeted check to add a default case to the match block:
    case _:
        cmd_parser.error(f"unknown subcommand {args.subcommand!r}")
    
  • In general, it is good to avoid mocking things as much as possible: it naturally leads to better API design, and it makes your tests less fragile. In this particular case, I don't think you don't need to patch sys.argv: instead you can pass an explicit argument list to parse_known_args(). Your dispatch() function can take a new argument args=None that it passes through to parse_known_args() so it becomes similarly testable without mocking.
  • Our docstrings are standardized on Markdown, not reST. See Coding_Standards. Please reformat your docstrings accordingly.
  • There's an accidental typo "equilvalent" in your help text, should be "equivalent"

Here are suggestions for stylistic improvement that I would make, but these are more subjective and you do not need to take them if you disagree with my rationale:

  • When I make a function call with more than one line of arguments, I like to start the first argument on a new line. At least by default, Emacs only indents this one level over, meaning things don't get squeezed over to the right so much. I think that's more readable overall and it leads to nicer diffs as you develop the code. Compare:
           format_args.add_argument("-f", "--format",
                                     choices=["json", "yaml", "uuid"],
                                     default="json",
                                     help="Set output format")
    vs.
           format_args.add_argument(
                "-f", "--format",
                choices=["json", "yaml", "uuid"],
                default="json",
                help="Set output format",
            )
  • When composing a sequence from multiple sources, consider using the splat operator. ["arvcli.py"] + list(args) becomes ["arvcli.py", *args]. patch_argv(*(subcommand.split() + ["--foo", "bar"])) becomes patch_argv(*[*subcommand.split(), "--foo", "bar"]).
  • You don't need to define _HelplessArgumentParser as a whole subclass, you can just use functools.partial to get the behavior you want:
           subparsers = self.add_subparsers(
                dest="subcommand",
                help="Subcommands",
                parser_class=functools.partial(argparse.ArgumentParser, add_help=False),
            )
    

Thanks.

Actions #13

Updated by Zoë Ma 4 months ago

Thank you for the detailed analysis. Most (almost all) the changes are now implemented as of 22177279e120623321dde43676d055e97479ce54. A test is running at run-tests-remainder: #5683

The only comment from me is about the case _: for the subcommand-based dispatching. As it stands, the code put in there will not be reached, because the arg parser as currently implemented will exit early on unknown subcommand. But, I expect that this may change as more features are added.

Actions #14

Updated by Brett Smith 4 months ago

Zoë Ma wrote in #note-13:

Thank you for the detailed analysis. Most (almost all) the changes are now implemented as of 22177279e120623321dde43676d055e97479ce54. A test is running at run-tests-remainder: #5683

Thanks. With the big stuff taken care of, two small things from here:

  • When we test that command line arguments are invalid, we can test that the exit code is exactly 2. This is a semi-standard exit code (e.g., systemd recognizes it) that argparse uses automatically, and we should make sure we're consistent with it in any custom handling code.
  • I'm open to discussing this but IMO we should not make -f and -s a mutually exclusive group. Instead we can just let these follow the default rule of "last one specified wins." This makes it easier to compose options in scripts etc. If you agree, test_global_conflict_options can just be removed.

The only comment from me is about the case _: for the subcommand-based dispatching. As it stands, the code put in there will not be reached, because the arg parser as currently implemented will exit early on unknown subcommand. But, I expect that this may change as more features are added.

Yes, we're on the same page about this. I believe all of this was also true about the except NameError block you wrote originally. And I appreciate that you were thinking ahead about future development and put the check in here now, before it became a problem.

Thanks.

Actions #15

Updated by Zoë Ma 4 months ago

As of f8c04086f23cce8c56cbd9358d6dc30930e7b02a these changes in #note-14 are implemented.
run-tests-remainder: #5685

I'm open to discussing this but IMO we should not make -f and -s a mutually exclusive group. Instead we can just let these follow the default rule of "last one specified wins." This makes it easier to compose options in scripts etc. If you agree, test_global_conflict_options can just be removed.

Originally I followed the current Ruby arv command that treats these two as exclusive. I agree though that "last one wins" would be nicer and easier to understand. Therefore, I also changed the test (now called test_global_format_options) to check that the last one takes effect.

Also, as for the subparsers for subcommands, I added required=True for the subparser group, to make the subcommand truly mandatory (and case _: truly unreachable).

Actions #16

Updated by Brett Smith 4 months ago

All the comments look addressed, thanks. I just have one question and one process note.

Zoë Ma wrote in #note-15:

As of f8c04086f23cce8c56cbd9358d6dc30930e7b02a these changes in #note-14 are implemented.
run-tests-remainder: #5685

In the future please run all of developer-run-tests. Running individual test suites is fine for retrying if you get bit by an intermittent test failure or something, but the starting point should be to run all tests. Especially since run-tests-remainder doesn't even run the Python SDK tests, so it's not even running your new tests. developer-run-tests: #4955

My only question about the code is, why are we adding conflict_handler="resolve"? It doesn't seem necessary (all the tests still pass if I take it out) and it seems like it could surprise people in the future.

Originally I followed the current Ruby arv command that treats these two as exclusive. I agree though that "last one wins" would be nicer and easier to understand. Therefore, I also changed the test (now called test_global_format_options) to check that the last one takes effect.

This is fine, I don't have any problem with it being here, but I'll note it does feel a little bit like we're testing the behavior of argparse, which is out of scope for us. So I'm just saying, as we continue development of this tool, I don't think we need to spend much time on tests like this which basically checks that an external library does what we think.

Actions #17

Updated by Zoë Ma 4 months ago

Thanks, these points are all addressed in 489fa2dcccbf0a7ea666c32a3aa93aaed978690d, where I

  • removed the conflict_handler parameter because it doesn't do what I thought it does (and is not necessary; 'last wins' is the default behavior with common dest), and
  • removed the tests that mostly exercises the standard library rather than our own code.

(For now I did left in place a test that basically tests arvcli.py -h foo results in normal exit because only the help is requested, because the handling of "-h" with subcommands is slightly non-trivial, and I just wanted to have it there to catch potential human errors as we further add to the new code.)

A test is running - developer-run-tests: #4956

Actions #18

Updated by Brett Smith 4 months ago

Zoë Ma wrote in #note-17:

Thanks, these points are all addressed in 489fa2dcccbf0a7ea666c32a3aa93aaed978690d, where I

This LGTM, thanks. I've merged it.

(For now I did left in place a test that basically tests arvcli.py -h foo results in normal exit because only the help is requested, because the handling of "-h" with subcommands is slightly non-trivial, and I just wanted to have it there to catch potential human errors as we further add to the new code.)

Yes, I agree this tests how we set up our own argument parser, not just argparse, and so it's worth testing to prevent regressions as we add functionality to the tool. Thanks.

Actions #19

Updated by Brett Smith 4 months ago

  • Status changed from In Progress to Resolved
Actions #20

Updated by Brett Smith about 2 months ago

  • Release set to 84
Actions

Also available in: Atom PDF