Bug #16217
closed[arvados-ws] Websocket server stops processing events, but stays connected
100%
Description
Sometimes, after successfully processing hundreds or thousands of events, arvados-ws goes into a state where clients don't receive any events. The EventsIn number at /status.json is static, which indicates arvados-ws isn't receiving events from PostgreSQL.
Clients can still connect / stay connected, the once-per-minute empty "ping" message still works.
Cause is unknown.
Updated by Peter Amstutz almost 5 years ago
- Target version set to 2020-03-25 Sprint
Updated by Tom Clegg almost 5 years ago
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
Not sure whether this is related to the observed failures but it seems worth fixing either way. Arvados-ws does a periodic listener ping, but hasn't been checking the returned error. With this change, if the ping fails, arvados-ws will log the error and exit/restart.
16217-ws-ping @ 9ebf73b1a1229bba507057ed2fb6a39635ce7e24 -- developer-run-tests: #1765
Updated by Peter Amstutz over 4 years ago
- Target version changed from 2020-03-25 Sprint to 2020-04-08 Sprint
Updated by Tom Clegg over 4 years ago
Replaces the old status/debug.json stuff with prometheus metrics. Also refactors services/ws to share service-startup code and distribute inside arvados-server like controller, boot, install, dispatchcloud, etc.
16217-ws-metrics @ 8d7a94c6799f20028725c1cc00614f1f7ae01209 -- developer-run-tests: #1797
16217-ws-metrics @ 8d7a94c6799f20028725c1cc00614f1f7ae01209 -- developer-run-tests: #1798
16217-ws-metrics @ 8d7a94c6799f20028725c1cc00614f1f7ae01209 -- developer-run-tests: #1800
Updated by Tom Clegg over 4 years ago
- Status changed from In Progress to Resolved