Bug #8277
closedpuma stops responding: "Connection reset by peer"
0%
Description
After some time under heavy load, the websockets in our system stop working.
The puma log shows nothing, and the nginx log shows errors for every request of the form:
2016/01/21 11:05:35 [error] 8741#0: *812590 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.17.180.19, server: ws.arvados.sanger.ac.uk, request: "GET /websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34 HTTP/1.1", upstream: "http://127.0.0.1:8100/websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34", host: "ws.arvados.sanger.ac.uk"
Attempting to connect directly to the puma server with curl also gives a similar message:
$ curl http://localhost:8100/websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34
curl: (56) Recv failure: Connection reset by peer
When I strace the process, all I see is a futex call that seems to never return:
- ps auxwww | grep 'www-data.*puma' | grep -v grep
www-data 9247 96.6 0.2 10139760 576192 ? Sl 01:26 560:43 puma 2.8.2 (tcp://127.0.0.1:8100) - strace -p 9247
Process 9247 attached - interrupt to quit
futex(0x7f4381dd1744, FUTEX_WAIT_PRIVATE, 1, NULL
I killed the puma process, causing runsv to restart it, and now everything seems to be fine again, but I suspect it will happen again at some point.
Our puma version reports itself as: Version 2.8.2 (ruby 2.1.7-p400), codename: Sir Edmund Percival Hillary
Updated by Brett Smith almost 9 years ago
- Status changed from New to Duplicate
#8323 documents the server-side issue in more detail.