Project

General

Profile

Actions

Support #18799

open

Strategy to generate Python SDK docstrings based on API docs

Added by Peter Amstutz almost 3 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

write script that

  • takes the discovery document
  • produces Python stubs with docstrings, type annotations etc corresponding to the google api client
  • adds the stub files to the python SDK
  • runs pydoc

The goal is for the methods/objects found under arvados.api() (generated on the fly by google api client) to be browsable in pydoc.


Files

GroupsIndexDoc.png (124 KB) GroupsIndexDoc.png Brett Smith, 01/16/2023 08:39 PM
GroupsIndexReturns.png (213 KB) GroupsIndexReturns.png Brett Smith, 01/16/2023 08:39 PM
discovery-pydoc-prototype.py (1.71 KB) discovery-pydoc-prototype.py Brett Smith, 01/16/2023 08:39 PM

Subtasks 1 (1 open0 closed)

Task #19724: group reviewNew

Actions

Related issues 3 (2 open1 closed)

Related to Arvados - Support #18263: Plan to document the Python SDKResolvedPeter Amstutz

Actions
Related to Arvados Epics - Story #18800: Update Python SDK documentationNew11/01/202204/30/2023

Actions
Related to Arvados - Bug #19929: Improve documentation in the discovery documentNew

Actions
Actions #1

Updated by Peter Amstutz almost 3 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz almost 3 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz almost 3 years ago

Actions #4

Updated by Peter Amstutz almost 3 years ago

  • Related to Story #18800: Update Python SDK documentation added
Actions #5

Updated by Peter Amstutz about 2 years ago

  • Target version set to 2022-11-23 sprint
Actions #6

Updated by Peter Amstutz about 2 years ago

  • Assigned To set to Brett Smith
Actions #7

Updated by Brett Smith about 2 years ago

One possible implemention: google-api-python-client already generates docstrings for API methods, based on information in the discovery document. For example:

>>> arvc = arvados.api('v1')
>>> print(arv.users().create.__doc__)
Create a new User.

Args:
  body: object, The request body. (required)
  select: array, Attributes of the new object to return in the response.
  ensure_unique_name: boolean, Adjust name to ensure uniqueness instead of returning an error on (owner_uuid, name) collision.
  cluster_id: string, Create object on a remote federated cluster instead of the current one.

Returns:
  An object of the form:

    { # User
    "uuid": "A String",
    "etag": "A String", # Object version.
    "owner_uuid": "A String",
    "created_at":   Unknown type! datetime
…

Probably the cheapest implementation is to instantiate an API client as normal, then introspect the generated methods to write the stubs. One major downside of this approach is that the docstring generation seems to be very static. I don't think we could customize it (e.g., to follow our own docstring style) without serious monkeypatching. See every mention of docs starting from https://github.com/googleapis/google-api-python-client/blob/3bbefc1352bcb2e302f7736643c9363799d5f5df/googleapiclient/discovery.py#L1193

If we want more control over the formatting, we'll probably end up basically rewriting all this ourselves. At which point, yeah, we can just work from the discovery document directly instead of the generated Python objects. (We can still use discovery document deserialization from apiclient.schema.)

Question: Where should the stubs go? In real code all these methods will be attached to the return value of arvados.api. Maybe call that result arvados.api.Client or arvados.api.Resources, and write the stubs under there?

Actions #8

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2022-11-23 sprint to 2022-12-21 Sprint
Actions #9

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
Actions #10

Updated by Brett Smith about 2 years ago

  • Subject changed from Strategy to tie the Python SDK to the API docs to Strategy to generate Python SDK docstrings based on API docs
Actions #11

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Actions #12

Updated by Peter Amstutz about 2 years ago

  • Tracker changed from Story to Support
Actions #13

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2023-02-01 sprint to To be groomed

Updated by Brett Smith about 2 years ago

Brett Smith wrote in #note-7:

Probably the cheapest implementation is to instantiate an API client as normal, then introspect the generated methods to write the stubs.

I prototyped this. See the attached script (it's just one page!). Call it like this with an Arvados API configuration in place:

python3 discovery-pydoc-prototype.py >arvados/sdk/python/arvados/api_resources.py

Then generate documentation as normal. The documentation will include this api_resources stub with information about all the API resources and methods.

The formatting is pretty rough. The docstrings only seem to care about plaintext presentation, so pdoc3 makes relatively big formatting decisions based on small whitespace inconsistencies. See attached for a couple of examples of how it looks.

If we need to do the cheapest thing that could possibly work, this is probably it. But there are definitely noticeable presentation improvements to be found by walking the discovery document ourselves and writing our own docstrings instead of using the ones generated by apiclient.

Actions #15

Updated by Brett Smith about 2 years ago

Doing it ourselves is a matter of iterating over the method definitions that match:

arv_client._resourceDesc['resources'][resource_name]['methods'][method_name]

For each method, look at description, parameters, and response. For each parameter, look at description, type, required, default, enum, and enumDescriptions. Not every parameter will define every key but those should all be checked. Consider special-casing parameters that have only a single enum possibility.

Cross-reference response against arv_client._resourceDesc['schemas'][response_type].

Actions #16

Updated by Brett Smith about 2 years ago

  • Related to Bug #19929: Improve documentation in the discovery document added
Actions

Also available in: Atom PDF