Project

General

Profile

Actions

Bug #8404

closed

Interrupted system call / warning: squeue exit status 256 ()

Added by Bryan Cosca about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

A few jobs failed:

2016-02-09_14:43:28 wx7k5-8i9sb-jiy58uzpgsjmy2v 7805 0 stderr run-command: caught exception
2016-02-09_14:43:28 wx7k5-8i9sb-jiy58uzpgsjmy2v 7805 0 stderr Traceback (most recent call last):
2016-02-09_14:43:28 wx7k5-8i9sb-jiy58uzpgsjmy2v 7805 0 stderr   File "/tmp/crunch-job/src/crunch_scripts/run-command", line 393, in <module>
2016-02-09_14:43:28 wx7k5-8i9sb-jiy58uzpgsjmy2v 7805 0 stderr     (pid, status) = os.wait()
2016-02-09_14:43:28 wx7k5-8i9sb-jiy58uzpgsjmy2v 7805 0 stderr OSError: [Errno 4] Interrupted system call

One example: wx7k5-8i9sb-bnmltdlp4ucwlbe, a GATK queue parent job failed. One job: wx7k5-8i9sb-qhd7u9uhpugccn3 failed due to it looks like manual cancellation at 2016-02-09 14:23:41 UTC, by me: wx7k5-tpzed-j0r27xny18tkq45

But, I was not in the office by then and still making my commute, so I don't know how this could have happened.

another example: wx7k5-8i9sb-jiy58uzpgsjmy2v failed to it at 2016-02-09_14:43:28 but I can't find any child queue job that failed to it.

Actions #1

Updated by Nico César about 10 years ago

Just a random thought: https://hg.python.org/cpython/rev/c3193b7156bb/ they just capture the EINTR and just continue ... shall we do the same?

Actions #2

Updated by Tom Clegg about 10 years ago

8404-catch-interrupted-syscall

Style nit: Perhaps better to use try: except: else: here, so the only thing in try: is the os.wait() call. That makes it easy to see where we expect the exception to occur.

        try:
            (pid, status) = os.wait()
        except OSError as e:
            if e.errno == errno.EINTR:
                pass
            else:
                raise
        else:
            pids.discard(pid)
            if not taskp.get("task.ignore_rcode"):
                rcode[pid] = (status >> 8)
            else:
                rcode[pid] = 0
Actions #3

Updated by Peter Amstutz about 10 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:1fc6d7713baabfe85b49191e156b6c093d22b69f.

Actions

Also available in: Atom PDF