Story #8488
closed[Microsoft] Democratize running bcbio CWL on qr1hi
100%
Description
As part of the Microsoft work, we need to demonstrate running variant calling and validation using bcbio and CWL (#8382 and #8381). We'll plan to do this on qr1hi using Peter's work with cwl-runner (#8176).
From the bcbio side, the latest test CWL is available from:
https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz
with more documentation here:
https://github.com/chapmanb/bcbio-nextgen/tree/master/cwl
We need to document/train Brad how to run on qr1hi so he can test and iterate new versions.
Updated by Brad Chapman almost 9 years ago
Peter;
Thanks for the tour of installing and testing this on Friday. I've gotten cwl-runner setup from your branch and was able to run an initial test run. Awesome.
I'm running into an issue where I think the bcbio/bcbio image on qr1hi (https://cloud.curoverse.com/collections/qr1hi-4zz18-doidmcskcmhn2bm) is out of date. How do we refresh it to the latest?
The run I got started is here:
https://cloud.curoverse.com/pipeline_instances/qr1hi-d1hrv-nybexwq0vehhuu4
and was failing with this error:
2016-02-27 18:42:32 arvados.cwl-runner[5027] ERROR: Got exception while collecting job outputs: Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/arvados_cwl/__init__.py", line 197, in done outputs = self.collect_outputs(self.builder.outdir) File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 235, in collect_output_ports ret[fragment] = self.collect_output(port, builder, outdir) File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 316, in collect_output adjustFileObjs(r, revmap) File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs adjustFileObjs(d, op) File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs adjustFileObjs(d, op) File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 149, in adjustFileObjs op(rec) File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 78, in revmap_file raise WorkflowException("Output file path %s must be within designated output directory (%s) or an input file pass through." % (f["path"], builder.outdir)) WorkflowException: Output file path align_prep/7_100326_FC6107FAAXX-1.fq.gz must be within designated output directory (keep:d586abc216dd7011f2e57eecc674f804+469) or an input file pass through.
which I believe is due to having relative paths in the output JSON. This was fixed in bcbio a couple of weeks back with the corresponding fix to cwltool (https://github.com/common-workflow-language/cwltool/pull/40). So I hope a refresh of the container will just fix it. It would also be great if I could re-update the container on demand as the latest also contains a lot of new functionality for the Microsoft work (variant calling, validation, SNAP support) that will probably need a few more iterations.
For reference, the up to date CWL I'm running is here:
https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz
I've also written up skeleton documentation on running this and will push that out once I've got a working run. Thanks again for this, I'm excited to have this so close to running.
Updated by Peter Amstutz almost 9 years ago
To update bcbio in Arvados, try "arv-keepdocker bcbio/bcbio"
Updated by Brad Chapman almost 9 years ago
Peter;
Thanks for the tip, and for updating the bcbio docker image (with the right local version of Docker). It looks like I got the latest Docker but am still running into the same issue:
2016-02-29 14:49:06 arvados.cwl-runner[19734] INFO: Job prep_align_inputs (qr1hi-8i9sb-zmvw2u2jdpertue) is Complete 2016-02-29 14:49:07 arvados.cwl-runner[19734] ERROR: Got exception while collecting job outputs: Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/arvados_cwl/__init__.py", line 197, in done outputs = self.collect_outputs(self.builder.outdir) File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 235, in collect_output_ports ret[fragment] = self.collect_output(port, builder, outdir) File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 316, in collect_output adjustFileObjs(r, revmap) File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs adjustFileObjs(d, op) File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs adjustFileObjs(d, op) File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 149, in adjustFileObjs op(rec) File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 78, in revmap_file raise WorkflowException("Output file path %s must be within designated output directory (%s) or an input file pass through." % (f["path"], builder.outdir)) WorkflowException: Output file path /tmp/crunch-job-task-work/compute2.1/outdir/align_prep/7_100326_FC6107FAAXX-1.fq.gz must be within designated output directory (keep:ed5abc7f4ed6c6771b68c208a8d10680+442) or an input file pass through.
It's now specifying the full output path instead of a relative path (/tmp/crunch-job-task-work/compute2.1/outdir/align_prep/7_100326_FC6107FAAXX-1.fq.gz) but that's not getting translated back into keep hash language so it barfs. Any ideas about how to proceed? Thanks again.
Updated by Peter Amstutz almost 9 years ago
- Status changed from New to In Progress
Updated by Brett Smith almost 9 years ago
- Target version changed from 2016-03-02 sprint to 2016-03-16 sprint
- Story points changed from 1.0 to 0.5
Updated by Peter Amstutz almost 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:9e5b98e8f5f4727856b53447191f9c06e3da2ba6.