Project

General

Profile

Actions

Bug #6997

closed

[Keep] keepstore reboots GCE host under heavy load

Added by Tom Clegg over 9 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Deployment
Target version:
-
Start date:
08/16/2015
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Steps to reproduce

  1. Start keepstore1.
  2. Write 3K small files2 (so the index isn't trivially small).
  3. Hit keepstore with 15 concurrent 40MiB writes3 (so it allocates 15 data buffers). Repeat, just to be sure.
  4. Hit keepstore with 20 concurrent /index requests4. Repeat a few times.
Runtime Build Env After writes After many indexes
go1.3.3 0d9da68 1.8G 3.7G
go1.3.3 0d9da68 GOGC=10 2.1G 2.3G
go1.3.3 0d9da68 GOMAXPROCS=8 2.4G >4.5G
go1.3.3 0d9da68 GOMAXPROCS=8 GOGC=10 2.4G 2.6G
go1.4.2 0d9da68 GOMAXPROCS=8 2.3G 3.5G
go1.4.2 0d9da68 GOMAXPROCS=8 GOGC=10 2.1G 2.4G

Evidently, "index" generates a lot of garbage, and the default GOGC=100 is much too high for a program like keepstore that normally fills >50% of the system's RAM with active buffers.

See http://golang.org/pkg/runtime/debug/#SetGCPercent

1 mkdir /tmp/zzz; echo abc | GOMAXPROCS=8 keepstore -volume=/tmp/zzz -data-manager-token-file=/dev/stdin

2 for i in `seq 0 3000`; do echo -n "$i "; curl -X PUT -H 'Authorization: OAuth2 abc' --data-binary $i localhost:25107/`echo -n $i | md5sum | head -c32`; done

3 dd if=/dev/zero of=/tmp/zero bs=1000000 count=40; for i in `seq 0 15`; do curl -X PUT -H 'Authorization: OAuth2 foo' --data-binary @/tmp/zero localhost:25107/48e9a108a3ec623652e7988af2f88867 & done

4 for i in `seq 0 20`; do curl -s -H 'Authorization: OAuth2 abc' localhost:25107/index >/dev/null & done

Fix

Specify GOGC=10 in the keepstore run script, both in our own deployments and the Keepstore install guide.


Files

keep10-nextlastgc.png (29.1 KB) keep10-nextlastgc.png Nico César, 08/21/2015 08:06 PM
keep10-pausetotalns.png (33.9 KB) keep10-pausetotalns.png Nico César, 08/21/2015 08:06 PM
keep10-mallocs.png (29.3 KB) keep10-mallocs.png Nico César, 08/21/2015 08:06 PM
keep10-sys.png (30 KB) keep10-sys.png Nico César, 08/21/2015 08:06 PM
keep10-gc.png (27.1 KB) keep10-gc.png Nico César, 08/21/2015 08:06 PM
keep10-mspans.png (32 KB) keep10-mspans.png Nico César, 08/21/2015 08:06 PM
keep10-mcache.png (25.7 KB) keep10-mcache.png Nico César, 08/21/2015 08:06 PM
keep10-lookups.png (29.5 KB) keep10-lookups.png Nico César, 08/21/2015 08:06 PM
keep10-stack.png (29.3 KB) keep10-stack.png Nico César, 08/21/2015 08:06 PM
keep10-heap.png (36.2 KB) keep10-heap.png Nico César, 08/21/2015 08:06 PM

Related issues 3 (0 open3 closed)

Related to Arvados - Bug #7121: [Keep] keepstore should use only one buffer for each PUT (and should not deadlock)ResolvedTom Clegg08/24/2015

Actions
Related to Arvados - Bug #7165: [Keep] Write replayable activity logsClosed08/28/2015

Actions
Has duplicate Arvados - Bug #7119: Pipeline instance failed and log collection is emptyDuplicate08/24/2015

Actions
Actions

Also available in: Atom PDF