Project

General

Profile

Actions

Idea #23057

open

Registered workflows become collections throughout Arvados

Added by Brett Smith 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
Story points:
-
Release:
Release relationship:
Auto

Description

Scope

This is a sub-epic. #19132 is the parent epic. The first ticket it describes is #21074, API changes to link workflows to collections in a backwards-compatible way. This sub-epic is the bridge between those API changes and the new user interfaces that are described in the epic: these are all the changes that need to happen in existing client functionality in order for us to have richer registered workflows that we can use to build the new interfaces. Every top-level bullet point in this description could be its own ticket, and it might be possible to divide even finer than that.

API Documentation

There are some things in the documentation I can't reconcile with the code. After reading both plus #21074, I get the feeling that during design we imagined workflow.collection_uuid represented a one-to-one relationship, then during implementation we realized it was many-to-one, but some of the writing still reflects the one-to-one plan.

source:doc/api/methods/workflows.html.textile.liquid says:

Trashing the linked collection will cause the workflow record to become trashed and eventually deleted as well.

I don't see anything in the code like this, and it doesn't even make sense as written, because workflow records don't support being trashed.

In general, I am left wondering what best practices for making new registered workflows are. I gather that clients should prefer working directly with the collection as much as possible. Should clients creating new registered workflows this way also create a workflow record? I guess yes for backwards compatibility? But if that's the case, should we consider having the API server just create it automatically when a collection is created or updated with type=workflow?

Escape hatch

I am concerned that if we make a release with the API changes, but not corresponding client changes, that opens the door to confusion and bugs in the system as a whole.

If we're concerned about that, we could simply add a validation on workflows that collection_uuid=null. This would leave all the API architecture we've built in place while preventing users from creating new-style workflows that could confuse clients. New-style workflow tests could be marked skip/xfail, and the documentation could be stashed away until we're ready.

arvados-cwl-runner

  • It would probably help to add a utility class to the Python SDK that wraps either a workflow without collection_uuid or a collection with type=workflow and provides a common set of operations on it. This is a lot more unit-testable than having branches throughout a-c-r.
  • a-c-r should use this class to load, create, and update registered workflows.
    • When creating a new workflow, should it also create a workflow record, or only create a collection?
  • a-c-r needs to accept collection UUIDs in places where it currently accepts a workflow UUID.
    • I don't think any of this stuff needs to be able to accept a portable data hash, because if we load a collection by portable data hash, we won't have the associated properties that are required for a valid workflow.
  • From #21074: "for template_uuid container requests, we continue to link to the workflow record by uuid, but add a new property arv:workflow_pdh so we know precisely which version of the code was run."

Workbench

  • The workflow view needs to become an extension of the collection view with additional "Runs," "Outputs," and "Inputs" tabs.
    • In general it expects to work with a collection with type=workflow and the associated required properties.
    • If loaded on a workflow with an associated collection_uuid it transparently uses that collection.
    • When viewing an old-school workflow definition without an associated collection, the bits of the interface that require a collection display "Not Available" or similar following our UI conventions. Action buttons and menu items that require a collection are grayed out.
  • When listing projects and search results:
    • Collections with type=workflow need to be displayed as workflows are now, including getting the right icon and being sorted where workflows are now.
    • Toolbar actions and context menu should include all the actions that are relevant to collections, plus "Run Workflow" and any other relevant workflow-specific actions (but at a glance I don't think there are any).
    • Listings should not be redundant. If results contain both a workflow record and the associated collection, the workflow should only be listed once in the results. There are two ways to make API queries that avoid overlap:
      1. Query workflows with include=["collection_uuid"] to get the associated collections. Then when querying collections, add the limit type!="workflow".
      2. Query all collections as normal, then query workflows with the limit collection_uuid=null.
  • An analogous change needs to be made when building the "Run a workflow" launcher listing.

Related issues 5 (3 open2 closed)

Related to Arvados Epics - Idea #19132: Improve UX for registering, browsing, and launching workflowsIn Progress09/01/202308/31/2025Actions
Related to Arvados - Feature #21074: "workflow" records link to a collection with the actual workflowResolvedPeter AmstutzActions
Related to Arvados - Feature #23055: Workflow records should be trashableNewActions
Related to Arvados - Feature #23062: Disable workflow→collection linking until we're ready to make clients readyResolvedTom CleggActions
Related to Arvados - Feature #22761: arvados-cwl-runner supports creating/updating workflows linked to collectionsNewActions
Actions #1

Updated by Brett Smith 8 months ago

  • Related to Idea #19132: Improve UX for registering, browsing, and launching workflows added
Actions #2

Updated by Brett Smith 8 months ago

  • Related to Feature #21074: "workflow" records link to a collection with the actual workflow added
Actions #3

Updated by Brett Smith 8 months ago

  • Related to Feature #23055: Workflow records should be trashable added
Actions #4

Updated by Brett Smith 8 months ago

  • Description updated (diff)
Actions #5

Updated by Brett Smith 8 months ago

  • Description updated (diff)
Actions #6

Updated by Brett Smith 8 months ago

  • Description updated (diff)
Actions #7

Updated by Brett Smith 8 months ago

  • Description updated (diff)
Actions #8

Updated by Brett Smith 8 months ago

  • Related to Feature #23062: Disable workflow→collection linking until we're ready to make clients ready added
Actions #9

Updated by Brett Smith 8 months ago

  • Related to Feature #22761: arvados-cwl-runner supports creating/updating workflows linked to collections added
Actions #10

Updated by Lisa Knox 8 months ago

I would like to suggest that we find another term that includes both collections and workflows that is not just "collections." Overloading the term will lead us to confusing situations where "collections" will include "collections" and also "not-collections", i.e. workflows. Is this item a collection? No, but it could still be a collection. It's both, actually: Schroedinger's collection.

We want to combine these two things, but they are not identical; they just have sufficient overlap. To me it feels like a parent (abstract) class containing all of the overlap and two sub-classes that extend it. All three of these things should have unique names.

Actions #11

Updated by Brett Smith 8 months ago

Lisa Knox wrote in #note-10:

I would like to suggest that we find another term that includes both collections and workflows that is not just "collections." Overloading the term will lead us to confusing situations where "collections" will include "collections" and also "not-collections", i.e. workflows. Is this item a collection? No, but it could still be a collection. It's both, actually: Schroedinger's collection.

Out of curiosity, how do you feel about the other "special" collection types? Docker images, container logs, intermediate outputs, container outputs? Some of these already get a little special treatment (like default filtering) and we could imagine more in the future (like dedicated icons and metadata display for Docker images). Do you feel like there's something extra unique about workflow collections?

We want to combine these two things, but they are not identical; they just have sufficient overlap. To me it feels like a parent (abstract) class containing all of the overlap and two sub-classes that extend it. All three of these things should have unique names.

I agree the subclass analogy is the right one. Partly because, again, I think it works for all our special collection types. You have Collection, and then DockerImage(Collection), ContainerLog(Collection), Workflow(Collection), etc.

Actions

Also available in: Atom PDF