Bug #7654
Updated by Peter Amstutz about 9 years ago
The EventClient() in the Python SDK has a race condition in its shutdown. When @EventClient.close()@ is called, either: It appears to be hanging here: * It shuts down successfully * The event thread crashes and prints a stack trace (because this is in a daemon thread, it doesn't interfere with program shutdown) <pre> Exception in thread WebSocketClient: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", "/home/peter/work/cwl-venv/local/lib/python2.7/site-packages/arvados_python_client-0.1.20151023214338-py2.7.egg/arvados/events.py", line 810, 57, in __bootstrap_inner close self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "build/bdist.linux-x86_64/egg/ws4py/websocket.py", line 427, in run if not self.once(): File "build/bdist.linux-x86_64/egg/ws4py/websocket.py", line 300, in once b = self.sock.recv(self.reading_buffer_size) self.stop.wait(1) AttributeError: 'NoneType' object has no attribute 'recv' </pre> * It hangs forever The problem is: # The parent class WebSocketClient.close() method This is designed for orderly shutdown. It sends a close message to affecting the server and prevents any more messages from being sent, but doesn't actually close CWL runner, possibly the socket. # The server is expected to respond with its own "closed" message. # If the server is uncoorperative or stuck FUSE driver, and doesn't respond with a "closed" message of its own, the client won't close the connection on its own. The server may even continue sending events, but if the application assumes anything else that it doesn't receive any more events after returning from close(), it won't be prepared to handle them (because it is shutting down) # To head this off, uses the current code calls close_connection() which explicitly closes the underlying socket. This also sets "WebSocketClient.sock" to "None". Unfortunately, as it turns out, the "threadedclient.WebSocketClient" is not threadsafe on this function. So this sometimes results websockets feature in the above crash when sock is set to None at a bad time. Proposed fix to EventClient.close(): * Call close() to start orderly shutdown * Set a flag indicating that received_message() shouldn't forward any more messages to on_event(). Put a mutex in received_message() so that close() doesn't return until waits for any message handlers are completed. * return from close(). At this point we don't care what happens because either the orderly shutdown will complete, or thread will be quietly killed and socket will get closed during process termination. Python SDK.