Actions
Bug #9865
closed[CWL] Fix undefined behavior after ignoring an unhandled exception
Status:
Resolved
Priority:
Normal
Assigned To:
Eric Biagiotti
Category:
SDKs
Target version:
Start date:
02/25/2019
Due date:
% Done:
100%
Estimated time:
(Total: 0.00 h)
Story points:
-
Release:
Release relationship:
Auto
Description
Example from the wild¶
2016-08-26 13:28:37 arvados.cwl-runner[20416] ERROR: While getting final output object: global name 'adjustFiles' is not defined 2016-08-26 13:28:37 arvados.cwl-runner[20416] INFO: Overall process status is success
There's a section here where any exception will be reported but then leave "outputs" in some half-baked state.
https://github.com/curoverse/arvados/blob/master/sdk/cwl/arvados_cwl/runner.py#L131
try:
outc = arvados.collection.Collection(record["output"])
with outc.open("cwl.output.json") as f:
outputs = json.load(f)
def keepify(fileobj):
path = fileobj["location"]
if not path.startswith("keep:"):
fileobj["location"] = "keep:%s/%s" % (record["output"], path)
adjustFileObjs(outputs, keepify)
adjustDirObjs(outputs, keepify)
except Exception as e:
logger.error("While getting final output object: %s", e)
This code should either:
- Reset outputs to None in the "except" block; or
- Make the "try" scope smaller, so once "outputs" isn't None, unexpected exceptions get propagated up.
It should also log the full backtrace for the caught exception.
The "try" block looks like it started accidentally including too much code in c8d9a898cde654b53200bda0b0ef8b406dd71739
Another example from the wild¶
2019-01-30T04:31:12.159992495Z Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _fileobject.__del__ of <socket._fileobject object at 0x7f1093013f50>> ignored 2019-01-30T04:31:12.161843551Z arvados.cwl-runner WARNING: Error checking states on API server: maximum recursion depth exceeded while calling a Python object
After this, arvados-cwl-runner stopped producing logs every 3 seconds, and appeared to be deadlocked.
Another example the following day reported a "maximum recursion depth" error, but kept logging "cwltool DEBUG: [workflow workflow.json#main] job step [...] not ready".
Actions