Project

General

Profile

Actions

Bug #9018

closed

[Node manager] exception handler should not kill parent process

Added by Tom Clegg almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Dispatchers
Target version:
Story points:
-

Description

A race condition in test_fatal_error (tests.test_failure.ActorUnhandledExceptionTest) causes os.killpg() to be called after it has been unstubbed. This kills the test suite and run-tests.sh.

There are two problems here:
  • The test should not have a race condition
  • The exception handler should only kill node manager itself, not other processes.

Proposed fix for overkill

Use os._exit() or os.kill(0,9) instead of os.killpg()

Proposed fix for test race

TBD?

Actions #1

Updated by Tom Clegg almost 10 years ago

  • Description updated (diff)
  • Category set to Dispatchers
Actions #2

Updated by Brett Smith almost 10 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Peter Amstutz almost 10 years ago

  • Target version changed from Arvados Future Sprints to 2016-05-25 sprint
Actions #4

Updated by Peter Amstutz almost 10 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:aea5300167770beb3cca6ad90e5ebb04da961416.

Actions #5

Updated by Tom Clegg almost 10 years ago

The test race might still exist. However, it hasn't been seen recently, so maybe some other changes have fixed it by accident.

(11:07:12) tetron_: I haven't seen the race condition happen 
(11:07:59) tetron_: and I haven't been able to work out a sequence that would cause it to happen
(11:10:51) tetron_: I believe the race only happens if the test also fails for some other reason and it's unable to wait for the actor to stop
Actions

Also available in: Atom PDF