Project

General

Profile

Actions

Bug #8277

closed

puma stops responding: "Connection reset by peer"

Added by Joshua Randall almost 9 years ago. Updated almost 9 years ago.

Status:
Duplicate
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
01/21/2016
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

After some time under heavy load, the websockets in our system stop working.

The puma log shows nothing, and the nginx log shows errors for every request of the form:

2016/01/21 11:05:35 [error] 8741#0: *812590 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.17.180.19, server: ws.arvados.sanger.ac.uk, request: "GET /websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34 HTTP/1.1", upstream: "http://127.0.0.1:8100/websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34", host: "ws.arvados.sanger.ac.uk"

Attempting to connect directly to the puma server with curl also gives a similar message:

$ curl http://localhost:8100/websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34
curl: (56) Recv failure: Connection reset by peer

When I strace the process, all I see is a futex call that seems to never return:

  1. ps auxwww | grep 'www-data.*puma' | grep -v grep
    www-data 9247 96.6 0.2 10139760 576192 ? Sl 01:26 560:43 puma 2.8.2 (tcp://127.0.0.1:8100)
  2. strace -p 9247
    Process 9247 attached - interrupt to quit
    futex(0x7f4381dd1744, FUTEX_WAIT_PRIVATE, 1, NULL

I killed the puma process, causing runsv to restart it, and now everything seems to be fine again, but I suspect it will happen again at some point.

Our puma version reports itself as: Version 2.8.2 (ruby 2.1.7-p400), codename: Sir Edmund Percival Hillary


Related issues 1 (0 open1 closed)

Is duplicate of Arvados - Bug #8323: [API] Puma hangs forever on a futex, requiring restartResolved01/29/2016

Actions
Actions #1

Updated by Brett Smith almost 9 years ago

  • Status changed from New to Duplicate

#8323 documents the server-side issue in more detail.

Actions

Also available in: Atom PDF