Project

General

Profile

Actions

Bug #11901

closed

[arvados-ws] Fix leaking postgres connections and subsequent stall

Added by Tom Clegg over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
06/26/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Occasionally arvados-ws reaches its database connection pool limit and stops responding.
  1. Fix leaking connections
  2. Report something helpful in debug.json, like how many connections are in use and what for (expect 1 for listener, ≤1 per server queue slot, and 1 per client connection doing "sendOldEvents")
  3. Add a health check that fails when we're at connection pool limit

Subtasks 1 (0 open1 closed)

Task #11904: Review 11901-ws-db-connsResolvedRadhika Chippada06/26/2017

Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Feature #11906: Basic authenticated http health check ("ping") for each system serviceResolvedRadhika Chippada07/17/2017

Actions
Actions #1

Updated by Tom Clegg over 7 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Tom Clegg over 7 years ago

11901-ws-db-conns @ c5a8ad7751e13560a6cde34395ea76f380c8a80d
  • fix an unclosed "rows" object
  • add authenticated /_health/ping and /_health/db handlers
  • add # open db connections to /debug.json
Actions #3

Updated by Tom Clegg over 7 years ago

The health-check specs here (authentication, URLs, responses) are the ones Nico and I developed last week based on existing conventions and ease of integration with consul, nagios, etc. I've since written them up on #11906.

Actions #4

Updated by Radhika Chippada over 7 years ago

  • Should "/_health/ping" and "/_health/db" also check if the ManagementToken is configured and bearer token matches? (I could not tell if this was already the case ...)
  • Would it make sense to test when management token is not configured as well (disabled)?
      if rtr.Config.ManagementToken == "" {
            http.Error(w, "disabled", http.StatusNotFound)
      }
Actions #5

Updated by Tom Clegg over 7 years ago

Radhika Chippada wrote:

  • Should "/_health/ping" and "/_health/db" also check if the ManagementToken is configured and bearer token matches? (I could not tell if this was already the case ...)

Yes, mgmtAuth() covers the http.ServeMux handling /_health/ so the individual handlers don't need to re-check.

  • Would it make sense to test when management token is not configured as well (disabled)?

Yes, added this test.

11901-ws-db-conns @ 5c860fdbf28128e7d11a9dff8b5c30777c2cbfeb

Actions #6

Updated by Tom Clegg over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:8051c3a14d40f0d410e4ddf54d89a084475d807e.

Actions

Also available in: Atom PDF