Project

General

Profile

Actions

Bug #9865

closed

[CWL] Fix undefined behavior after ignoring an unhandled exception

Added by Tom Clegg over 8 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Eric Biagiotti
Category:
SDKs
Target version:
Start date:
02/25/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Example from the wild

2016-08-26 13:28:37 arvados.cwl-runner[20416] ERROR: While getting final output object: global name 'adjustFiles' is not defined
2016-08-26 13:28:37 arvados.cwl-runner[20416] INFO: Overall process status is success

There's a section here where any exception will be reported but then leave "outputs" in some half-baked state.

https://github.com/curoverse/arvados/blob/master/sdk/cwl/arvados_cwl/runner.py#L131

            try:
                outc = arvados.collection.Collection(record["output"])
                with outc.open("cwl.output.json") as f:
                    outputs = json.load(f)
                def keepify(fileobj):
                    path = fileobj["location"]
                    if not path.startswith("keep:"):
                        fileobj["location"] = "keep:%s/%s" % (record["output"], path)
                adjustFileObjs(outputs, keepify)
                adjustDirObjs(outputs, keepify)
            except Exception as e:
                logger.error("While getting final output object: %s", e)
This code should either:
  • Reset outputs to None in the "except" block; or
  • Make the "try" scope smaller, so once "outputs" isn't None, unexpected exceptions get propagated up.

It should also log the full backtrace for the caught exception.

The "try" block looks like it started accidentally including too much code in c8d9a898cde654b53200bda0b0ef8b406dd71739

Another example from the wild

2019-01-30T04:31:12.159992495Z Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _fileobject.__del__ of <socket._fileobject object at 0x7f1093013f50>> ignored
2019-01-30T04:31:12.161843551Z arvados.cwl-runner WARNING: Error checking states on API server: maximum recursion depth exceeded while calling a Python object

After this, arvados-cwl-runner stopped producing logs every 3 seconds, and appeared to be deadlocked.

Another example the following day reported a "maximum recursion depth" error, but kept logging "cwltool DEBUG: [workflow workflow.json#main] job step [...] not ready".


Subtasks 1 (0 open1 closed)

Task #14840: Review 9865-cwl-fix-ignored-exceptionsResolvedPeter Amstutz02/25/2019

Actions
Actions

Also available in: Atom PDF