This continues to be a problem as of:
ii python-arvados-cwl-runner 1.0.20180216164101-3 all The Arvados CWL runner
A workflow submitted with this input file works fine:
$ cat 15x-interval-147.library-cram-to-gvcfs.noimport.001.yaml
cwl:tool: ../workflows/gatk-4.0.0.0-haplotypecaller-genotypegvcfs-libraries.cwl
library_cram:
class: File
location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram
secondaryFiles:
- class: File
location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai
chunks: 200
intersect_file:
class: File
location: keep:0209730ab274aa4adce0557580fa6c64+90/wgs_calling_regions.hg38.interval_list
ref_fasta_files:
- $import: sanger_human_references.yaml
However, the same workflow with this input file fails:
$ cat 15x-interval-147.library-cram-to-gvcfs.001.yaml
cwl:tool: ../workflows/gatk-4.0.0.0-haplotypecaller-genotypegvcfs-libraries.cwl
library_cram:
$import: 15x-interval-147.library_cram.001.yaml
chunks: 200
intersect_file:
class: File
location: keep:0209730ab274aa4adce0557580fa6c64+90/wgs_calling_regions.hg38.interval_list
ref_fasta_files:
- $import: sanger_human_references.yaml
$ cat 15x-interval-147.library_cram.001.yaml
class: File
location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram
secondaryFiles:
- class: File
location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai
Observe the difference in the cwl.input.json submitted in the container request for the failing workflow:
$ arv get ncucu-xvhdp-to7wppbrlrt8x5e | jq '.mounts["/var/lib/cwl/cwl.input.json"].content.library_cram'
{
"class": "File",
"location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram",
"secondaryFiles": [
{
"class": "File",
"location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai",
"basename": "15399492.CCXX.paired310.0f619520c3.cram.crai"
}
],
"id": "file:///home/mercury/checkouts/arvados-pipelines/cwl/15x-interval-147.library_cram.001.yaml",
"basename": "15399492.CCXX.paired310.0f619520c3.cram"
}
As opposed to the working workflow:
$ arv get ncucu-xvhdp-kuk3ouys8d4q3lg | jq '.mounts["/var/lib/cwl/cwl.input.json"].content.library_cram'
{
"class": "File",
"location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram",
"secondaryFiles": [
{
"class": "File",
"location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai",
"basename": "15399492.CCXX.paired310.0f619520c3.cram.crai"
}
],
"basename": "15399492.CCXX.paired310.0f619520c3.cram"
}
For some reason, the $import directive is causing an `id` property to be added to the `File` object, which the submitted runner then attempts to dereference, resulting in a failure.
Note that there is no problem with `ref_fasta_files` in this workflow. There likewise is no problem with $import if you replace the `File` input with a `File[]` input and have a list of files in the imported yaml.