Support #14395
closedWhat is proper way to handle an input that can be an array of Files or an array of null?
Description
I have a CWL workflow with optional gvcf output files that gets scattered in one step; here's the output:
gvcfs:
type: ["null", File]
In the next step - the gather step - I'm retrieving it as an array of Files in the inputs:
gvcfs: File[]?
If i create the gvcf files it works great. But, if I don't generate gvcf files so that "null" is returned, it gets to this step in the workflow and cwl-runner throws this when it gets to this step:
2018-10-24T13:44:56.471062973Z stderr the `gvcfs` field is not valid because 2018-10-24T13:44:56.471062973Z stderr tried array of <File> but 2018-10-24T13:44:56.471062973Z stderr item is invalid because 2018-10-24T13:44:56.471062973Z stderr is not a dict
I ran in debug mode and I see:
https://collections.e51c5.arvadosapi.com/c=63bd82e94daa392771584ffd429fa0c1-244/_/stderr.txt?disposition=inline
2018-10-24T13:32:09.872146633Z "file:///var/lib/cwl/workflow.json#main/make_examples/gvcfs": [ 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null, 2018-10-24T13:32:09.872146633Z null 2018-10-24T13:32:09.872146633Z ],
So, I realize the problem probably has to do with my CWL for the input at the gather stage is accepting either an Array of files or null but it isn't able to handle whatever this data structure is- an array of nulls i guess?
I would like to know how to write my gvcfs input in CWL so that this works properly, but I have been unable to figure this out.
I tried:
gvcfs:
type:
type: array
items: ["null", File]
But I got this error:
2018-10-24T14:40:36.902084433Z cwltool ERROR: Unexpected exception 2018-10-24T14:40:36.902084433Z Traceback (most recent call last): 2018-10-24T14:40:36.902084433Z File "/usr/lib/python2.7/dist-packages/cwltool/workflow.py", line 755, in job 2018-10-24T14:40:36.902084433Z runtimeContext): 2018-10-24T14:40:36.902084433Z File "/usr/lib/python2.7/dist-packages/cwltool/command_line_tool.py", line 428, in job 2018-10-24T14:40:36.902084433Z if "entry" in t: 2018-10-24T14:40:36.902084433Z TypeError: argument of type 'NoneType' is not iterable 2018-10-24T14:40:36.902390033Z cwltool ERROR: [step postprocess_variants] Cannot make job: argument of type 'NoneType' is not iterable
Updated by Stephen McLaughlin over 7 years ago
I have determined that the correct way to do what I was trying to do in CWL is in fact:
gvcfs:
type:
type: array
items: ["null", File]
The error I reported was coming from another part of my workflow where there is an Expression:
InitialWorkDirRequirement:
listing: |
${
var r = [];
if (inputs.gvcfs != null) {
for (var i = 0; i < inputs.gvcfs.length; i++) {
r.push(inputs.gvcfs[i]);
}
}
return r;
}
I was attempting to test if inputs.gvcfs is null and if it's not, i was iterating over the files and staging them to the working directory. inputs.gvcfs is not null it is [null, null, null, ...] so it passes the test and attempts to iterate over and stage nulls like files. This seems to cause the error.
What I was attempting to do is to stage the files to the working directory but only if they are optionally provided as input. This now seems impossible and/or too clunky to me so I am going to refactor the CWL.
Updated by Tom Morris over 7 years ago
- Project changed from Arvados Workbench 2 to Arvados
- Status changed from New to Closed
I've closed this, but you should be able to close issues yourself if you're the one who created the ticket.