Bug #9918
closedkeep-balance fails with "Malformed index line" error
100%
Description
It took me three tries to get keep-balance started just now. The first two invocations had two different "Malformed index line" errors:
# keep-balance -commit-pulls -commit-trash -config ~/keep-balance.json 2016/09/01 16:52:16 starting up: will scan every 6h0m0s and on SIGUSR1 2016/09/01 16:52:16 Run: start 2016/09/01 16:52:16 clearing existing trash lists, in case the new rendezvous order differs from previous run 2016/09/01 16:52:16 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:16 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 21.506846ms 2016/09/01 16:52:16 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 47.746011ms 2016/09/01 16:52:16 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 50.447442ms 2016/09/01 16:52:16 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 51.231171ms 2016/09/01 16:52:16 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 51.293859ms 2016/09/01 16:52:16 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 62.350323ms 2016/09/01 16:52:16 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 64.933237ms 2016/09/01 16:52:16 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 95.565147ms 2016/09/01 16:52:16 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 107.872375ms 2016/09/01 16:52:17 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 771.892243ms 2016/09/01 16:52:17 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.106260655s 2016/09/01 16:52:17 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.116829494s 2016/09/01 16:52:17 GetCurrentState: start 2016/09/01 16:52:17 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:17 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:19 GetCurrentState: took 2.002191793s 2016/09/01 16:52:19 Run: took 3.452737799s 2016/09/01 16:52:19 run failed: z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): Malformed index line "bb9386b9b29d46e402eb8e8ab": 1 fields 2016/09/01 16:52:24 collections: 0/5681101 ^C
# keep-balance -commit-pulls -commit-trash -config ~/keep-balance.json [64/3739] 2016/09/01 16:52:35 starting up: will scan every 6h0m0s and on SIGUSR1 2016/09/01 16:52:35 Run: start 2016/09/01 16:52:35 clearing existing trash lists, in case the new rendezvous order differs from previous run 2016/09/01 16:52:35 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:52:35 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 2.694847ms 2016/09/01 16:52:35 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 2.736887ms 2016/09/01 16:52:35 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 3.209453ms 2016/09/01 16:52:35 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.332583ms 2016/09/01 16:52:35 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 3.470783ms 2016/09/01 16:52:35 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.558416ms 2016/09/01 16:52:35 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 3.586257ms 2016/09/01 16:52:35 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 3.740025ms 2016/09/01 16:52:35 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.852256ms 2016/09/01 16:52:35 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.965554ms 2016/09/01 16:52:35 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 5.032221ms 2016/09/01 16:52:35 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 6.046143ms 2016/09/01 16:52:35 GetCurrentState: start 2016/09/01 16:52:35 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:35 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:52:38 GetCurrentState: took 3.334336885s 2016/09/01 16:52:38 Run: took 3.473451961s 2016/09/01 16:52:38 run failed: z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): Malformed index line "a8": 1 fields 2016/09/01 16:52:42 collections: 0/5681104 ^C
But the third time proceeded normally:
# keep-balance -commit-pulls -commit-trash -config ~/keep-balance.json [18/3739] 2016/09/01 16:54:06 starting up: will scan every 6h0m0s and on SIGUSR1 2016/09/01 16:54:06 Run: start 2016/09/01 16:54:06 clearing existing trash lists, in case the new rendezvous order differs from previous run 2016/09/01 16:54:06 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: start 2016/09/01 16:54:06 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.241881ms 2016/09/01 16:54:06 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.427476ms 2016/09/01 16:54:06 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): send trash list: took 1.352339ms 2016/09/01 16:54:06 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.72489ms 2016/09/01 16:54:06 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): send trash list: took 1.336507ms 2016/09/01 16:54:06 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): send trash list: took 2.110378ms 2016/09/01 16:54:06 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): send trash list: took 2.365868ms 2016/09/01 16:54:06 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.166784ms 2016/09/01 16:54:06 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): send trash list: took 2.759222ms 2016/09/01 16:54:06 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.303592ms 2016/09/01 16:54:06 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): send trash list: took 3.36095ms 2016/09/01 16:54:06 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): send trash list: took 3.311311ms 2016/09/01 16:54:06 GetCurrentState: start 2016/09/01 16:54:07 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:07 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): retrieve index 2016/09/01 16:54:14 collections: 0/5681119 2016/09/01 16:54:29 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): add 1618022 replicas to map 2016/09/01 16:54:30 z8ta6-bi6l4-4b0e02ad7mk84ye (humgen-01-01.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:39 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): add 1113535 replicas to map 2016/09/01 16:54:40 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): add 1113399 replicas to map 2016/09/01 16:54:40 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): add 1072642 replicas to map 2016/09/01 16:54:40 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): add 1113899 replicas to map 2016/09/01 16:54:40 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): add 1114450 replicas to map 2016/09/01 16:54:40 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): add 1112983 replicas to map 2016/09/01 16:54:40 z8ta6-bi6l4-stmnte9yvd2gh6o (humgen-04-01.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:41 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): add 1115209 replicas to map 2016/09/01 16:54:41 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): add 1111975 replicas to map 2016/09/01 16:54:41 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): add 1114356 replicas to map 2016/09/01 16:54:41 z8ta6-bi6l4-nynctbmdi8nj6v0 (humgen-01-03.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:41 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): add 1113132 replicas to map 2016/09/01 16:54:42 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): add 1112714 replicas to map 2016/09/01 16:54:43 z8ta6-bi6l4-az89xled1ycwnpb (humgen-04-03.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:44 z8ta6-bi6l4-3kqkr5lgow2uogm (humgen-03-01.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:44 z8ta6-bi6l4-lhps1yuzszk0315 (humgen-04-02.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:45 z8ta6-bi6l4-w3rpndae62qwwre (humgen-02-02.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:46 z8ta6-bi6l4-ph34sug9wmnom07 (humgen-03-02.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:46 z8ta6-bi6l4-a1pntf0wx8vfr5v (humgen-03-03.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:47 z8ta6-bi6l4-sg7xxak114gh1j0 (humgen-02-03.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:48 z8ta6-bi6l4-yxhkoekmnv5czf3 (humgen-02-01.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:54:48 z8ta6-bi6l4-kijrzcy3zkflg3s (humgen-01-02.internal.sanger.ac.uk:25107, disk): done 2016/09/01 16:55:28 collections: 10000/5681119
(presumably if all goes well this run will complete in about 6-7 hours from now as is usual on our cluster)
Updated by Tom Clegg over 8 years ago
This looks like the index response is truncated, e.g., due to a network problem or keepstore crash. We should fix the error reporting so it's more obvious if/when this happens.
Updated by Tom Morris over 7 years ago
- Target version set to Arvados Future Sprints
Updated by Tom Clegg over 6 years ago
We have seen similar errors caused by timeouts; keep-balance uses a client that times out 5 minutes after starting a request, even if data has been arriving the whole time. In the case of indexing, it would be more appropriate to time out only if the connection is silent for 5 minutes. The next best thing would be to have a longer/configurable timeout.
In any case we should also display the connection error instead of (or in addition to) the "truncated input" error, so the operator can tell what's happening.
Updated by Tom Clegg over 6 years ago
9918-index-error @ 7da2c48e181298f5b5d1691de76603b9e92ec9d6
Updated by Tom Clegg over 6 years ago
- Category set to Keep
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2018-05-23 Sprint
Updated by Tom Clegg over 6 years ago
9918-index-timeout @ 012677d2d3fb4571da4a48ea49eae156f28bf6af adds a RequestTimeout config.
This isn't as good as a "timeout if connection goes silent for N seconds" but at least it's better than a hard-coded 5 minute timeout.
Client:
APIHost: zzzzz.arvadosapi.com:443
AuthToken: xyzzy
Insecure: false
KeepServiceTypes:
- disk
RunPeriod: 600s
CollectionBatchSize: 100000
CollectionBuffers: 1000
RequestTimeout: 30m
Updated by Ward Vandewege over 6 years ago
- Status changed from In Progress to Resolved
Updated by Ward Vandewege over 6 years ago
Tom Clegg wrote:
9918-index-timeout @ 012677d2d3fb4571da4a48ea49eae156f28bf6af adds a RequestTimeout config.
This isn't as good as a "timeout if connection goes silent for N seconds" but at least it's better than a hard-coded 5 minute timeout.
[...]
9918-index-timeout @ 012677d2d3fb4571da4a48ea49eae156f28bf6af LGTM, I've tested it and the settings work.
I've merged it. I think we can close this ticket now, Josh, feel free to re-open if there is something else to be done!