Bug #9018
closed
[Node manager] exception handler should not kill parent process
Added by Tom Clegg almost 9 years ago.
Updated almost 9 years ago.
Description
A race condition in test_fatal_error (tests.test_failure.ActorUnhandledExceptionTest) causes os.killpg() to be called after it has been unstubbed. This kills the test suite and run-tests.sh.
There are two problems here:
- The test should not have a race condition
- The exception handler should only kill node manager itself, not other processes.
Proposed fix for overkill¶
Use os._exit() or os.kill(0,9) instead of os.killpg()
Proposed fix for test race¶
TBD?
- Description updated (diff)
- Category set to Node Manager
- Target version set to Arvados Future Sprints
- Target version changed from Arvados Future Sprints to 2016-05-25 sprint
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:aea5300167770beb3cca6ad90e5ebb04da961416.
The test race might still exist. However, it hasn't been seen recently, so maybe some other changes have fixed it by accident.
(11:07:12) tetron_: I haven't seen the race condition happen
(11:07:59) tetron_: and I haven't been able to work out a sequence that would cause it to happen
(11:10:51) tetron_: I believe the race only happens if the test also fails for some other reason and it's unable to wait for the actor to stop
Also available in: Atom
PDF