Story #10344
open[Workbench] Import CWL workflow
Added by Tom Morris about 8 years ago. Updated about 5 years ago.
0%
Updated by Peter Amstutz about 8 years ago
Possible approach:
- Upload CWL files to collection via web interface.
- User clicks on "register workflow" and gets a file picker
- Workbench fetches collection into a temp directory and runs arvados-cwl-runner on the backend to register workflow.
- Workflow record & container request creation remains the same.
- Unclear how/whether to update workflow record when collection is updated, creates synchronization problem likely to lead to user confusion
Alternate solution:
- Upload CWL files to collection via web interface.
- User clicks on "register workflow" and gets a file picker
- Workbench adds a link object pointing to the collection to indicate it stores a workflow.
- current "workflows" table is redundant and can be eliminated.
- Run a workflow picker queries for link_class "workflow"
- Workbench fetches collection to generate input editing UI
- No synchronization problem (user just updates collection)
- Feature can easily be extended to support workflows stored in git repositories in the future
Updated by Tom Clegg about 8 years ago
- Click "create workflow" button (on workflows#index or ...)
- Choose files from local filesystem
- Build workflow object (in client-side JS)
- Do workflows#create call directly to API server
- Refresh page to make new workflow shows up
(How much of arvados-cwl-runner needs to be ported to JS in order to make this happen?)
Updated by Peter Amstutz about 8 years ago
Tom Clegg wrote:
Another possibility:
- Click "create workflow" button (on workflows#index or ...)
- Choose files from local filesystem
- Build workflow object (in client-side JS)
- Do workflows#create call directly to API server
- Refresh page to make new workflow shows up
(How much of arvados-cwl-runner needs to be ported to JS in order to make this happen?)
arvworkflow.upload_workflow:
- Finds referenced Docker images and uploads them (impossible from browser)
- Traverses document dependencies and packs them into a single document (would need to port dependency scanning and document packing)
- Alternately, if we store the files in a collection, we don't have to do the packing, just the scanning
- Document can have non-CWL dependencies (e.g. python scripts used by the workflow), these also have to be uploaded to a collection and references in CWL file updated
- Alternately, if we store the files in a collection, we just have to ensure that relative references are maintained.
So, this strategy is more viable under the "alternate solution" case where we store the workflow files as-is in a collection instead of storing compound documents in the 'workflows' table. This would be a better UX than requiring the user to select each file separately. However, we also need to examine browser security policy around accessing the file system.
Updated by Peter Amstutz about 8 years ago
Here's another idea. CWL files are placed in git and discovered automatically.
- Gitolite post-update hook
- Scan updated branch for Dockerfile
- docker build
- Scan repo for CWLFile, Dockstore.cwl
- Create or update(?) workflow records for each one with arvados-cwl-runner --create-workflow
- use link to connect repo+branch/tag with workflow record
Benefits:
- Provide version tracking, provenance for CWL, Docker files -> Workflow
- Best user experience (work locally, push to git, workflow automatically updates)
- Can already view git repositories in workbench
- Does not require any workbench changes
- Can use repository layout/conventions that are compatible with Dockstore, make it easier for users to publish their dockerfiles/workflows
Considerations:
- Where does the registration service run (is it a subprocess forked from gitolite, or a separate service)
- How to return messages/errors to user
- Assumes user ability to use git
- Must be documented (but shouldn't be very hard, could add explicit links to documentation from workbench)
Updated by Tom Clegg about 8 years ago
Peter Amstutz wrote:
examine browser security policy around accessing the file system.
FileReader API lets us do this, provided the files have been selected by the user with an <input type=file>
widget.
Updated by Peter Amstutz about 8 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
examine browser security policy around accessing the file system.
FileReader API lets us do this, provided the files have been selected by the user with an
<input type=file>
widget.
Right, so the user has to explicitly select each file via input widget or drop target, so dependency scanning doesn't really work.
Updated by Peter Amstutz about 8 years ago
Proof of concept branch for auto build/import of Docker image and workflow @ 10344-import-workflow-from-git
Updated by Peter Amstutz about 8 years ago
Behavior in 10344-import-workflow-from-git:
This is based on the behavior of Dockstore (dockstore.org)
- Clone repository
- For each branch in the repository:
- Search for Dockerfiles
- Build Dockerfiles and name them based on repository name + location in repository
- Search for CWL files named Dockstore.cwl or CWLFile
- Register them as workflows
- Create a link record to associate the repository + branch with the workflow record, so that the workflow can be updated instead of creating a new one each time.
Usage
$ ./workflowimporter.py briandoconnor/dockstore-tool-bamstats develop Cloning into '/tmp/tmpURhlYi'... done. Already on 'develop' Your branch is up-to-date with 'origin/develop'. Sending build context to Docker daemon 202.4 MB Step 1 : FROM ubuntu:14.04 ---> f6e25e99cf98 Step 2 : MAINTAINER Brian OConnor <briandoconnor@gmail.com> ---> Using cache ---> 30d6edff33a7 Step 3 : USER root ---> Using cache ---> 0f90323c0162 Step 4 : RUN apt-get -m update && apt-get install -y wget unzip openjdk-7-jre zip ---> Using cache ---> 2e013e76386c Step 5 : RUN wget -q http://downloads.sourceforge.net/project/bamstats/BAMStats-1.25.zip ---> Using cache ---> d23414a4c725 Step 6 : RUN unzip BAMStats-1.25.zip && rm BAMStats-1.25.zip && mv BAMStats-1.25 /opt/ ---> Using cache ---> 7c8ca1ebd48c Step 7 : COPY bin/bamstats /usr/local/bin/ ---> Using cache ---> f11d8fffbeac Step 8 : RUN chmod a+x /usr/local/bin/bamstats ---> Using cache ---> 02cf4f6b9c5a Step 9 : RUN groupadd -r -g 1000 ubuntu && useradd -r -g ubuntu -u 1000 -m ubuntu ---> Using cache ---> 4290f3727457 Step 10 : USER ubuntu ---> Using cache ---> 9b10c8810afc Step 11 : CMD /bin/bash ---> Using cache ---> 393fb89a2ac7 Successfully built 393fb89a2ac7 962eh-4zz18-lu552la35bicizx Updated workflow 962eh-7fd4e-gkbzl62qqtfig37
This makes the user experience pretty easy:
- Write Dockerfile
- Write Dockstore.cwl
- add to git & push
- Docker images + workflow appears in workbench automatically (when implemented as git hook or backend service)
Updated by Bryan Cosca about 8 years ago
sounds pretty cool! a few questions:
what's Dockstore.cwl and why would I need to write that?
Can this work without Dockerfiles? What if the image is already in keep? Will the Dockerfiles overwrite that image?
Updated by Peter Amstutz about 8 years ago
Bryan Cosca wrote:
sounds pretty cool! a few questions:
what's Dockstore.cwl and why would I need to write that?
The idea is to just register "primary" CWL files under a specific name. Otherwise it would register every single tool in the repository.
You can have more than one "Dockstore.cwl" in a single repo, they would just need to go into separate directories.
The reason for naming it "Dockstore.cwl" is to be compatible with Dockstore:
https://dockstore.org/containers/quay.io/briandoconnor/dockstore-tool-bamstats
https://github.com/briandoconnor/dockstore-tool-bamstats
However we could also try to persuade the Dockstore developers to support a more generic name, like "CWLFile".
Can this work without Dockerfiles?
Yes, however in that case it would need to already be in keep, or pull the docker image from somewhere else like docker hub.
What if the image is already in keep? Will the Dockerfiles overwrite that image?
The idea is that if you provide a Dockerfile, every time you push your branch, it will run docker build. If nothing has changed, you will get cached layers and the same image. If the image has changed, it will update it. If it is not different, it won't change anything. The goal is to make your work as a bioinformatician easier by automating the currently somewhat manual steps of managing the Docker image and Workflow record.
I'd like for this to become a new service that Arvados provides, but the script is written such that you could start using it right now.
Updated by Tom Morris over 7 years ago
- Target version set to Arvados Future Sprints
Updated by Tom Morris almost 7 years ago
- Related to Story #13080: Create/upload workflows through Workbench added
Updated by Peter Amstutz about 5 years ago
- Target version deleted (
Arvados Future Sprints) - Release set to 20