Project

General

Profile

Actions

Story #8437

closed

[Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recover

Added by Brett Smith almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Node Manager
Target version:
Start date:
02/16/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

on_failure gets called when an unhandled exception is raised in an actor method. All our actors should use this to detect when there's an exception that means the whole process is unlikely to recover, and kill it (probably by sending SIGKILL to the process group):

  • threading.ThreadError (can't create thread due to out of RAM)
  • OSError reporting that we can't allocate RAM (hopefully it has errno ENOMEM? Basically we want this to replace the strategy implemented in #6321.)
  • MemoryError

Define a new class that just defines this on_failure method; then have all of the other Node Manager actors use it as their superclass.


Subtasks 2 (0 open2 closed)

Task #8466: Review 8437-nodemanager-on-failureResolvedPeter Amstutz02/16/2016

Actions
Task #8524: Add on_failure handlerResolvedPeter Amstutz02/16/2016

Actions
Actions

Also available in: Atom PDF