Bug #11901
closed
[arvados-ws] Fix leaking postgres connections and subsequent stall
Added by Tom Clegg over 7 years ago.
Updated over 7 years ago.
Estimated time:
(Total: 0.00 h)
Description
Occasionally arvados-ws reaches its database connection pool limit and stops responding.
- Fix leaking connections
- Report something helpful in debug.json, like how many connections are in use and what for (expect 1 for listener, ≤1 per server queue slot, and 1 per client connection doing "sendOldEvents")
- Add a health check that fails when we're at connection pool limit
- Status changed from New to In Progress
The health-check specs here (authentication, URLs, responses) are the ones Nico and I developed last week based on existing conventions and ease of integration with consul, nagios, etc. I've since written them up on #11906.
- Should "/_health/ping" and "/_health/db" also check if the ManagementToken is configured and bearer token matches? (I could not tell if this was already the case ...)
- Would it make sense to test when management token is not configured as well (disabled)?
if rtr.Config.ManagementToken == "" {
http.Error(w, "disabled", http.StatusNotFound)
}
Radhika Chippada wrote:
- Should "/_health/ping" and "/_health/db" also check if the ManagementToken is configured and bearer token matches? (I could not tell if this was already the case ...)
Yes, mgmtAuth() covers the http.ServeMux handling /_health/ so the individual handlers don't need to re-check.
- Would it make sense to test when management token is not configured as well (disabled)?
Yes, added this test.
11901-ws-db-conns @ 5c860fdbf28128e7d11a9dff8b5c30777c2cbfeb
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:8051c3a14d40f0d410e4ddf54d89a084475d807e.
Also available in: Atom
PDF