Project

General

Profile

Actions

Bug #9456

closed

[keep-balance] don't clear existing trash lists on the 2nd..Nth iteration

Added by Tom Clegg over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
06/22/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

If a balancing operation takes longer than RunPeriod, we get back-to-back runs. If each run ends with "send trash list" and the next kicks off right away with "clear trash list", keepstore has little chance of making progress.

See #9437: although trash lists never got sent because sending pull lists failed, the log demonstrates the timing problem. The "send trash list" log messages (referring to empty lists) follow the previous run's "send pull lists" logs so closely that (unless you know what to look for) they look like they are referring to the previous run's trash lists.


Subtasks 1 (0 open1 closed)

Task #9563: Review 9456-less-clear-trashResolvedTom Clegg06/22/2016

Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #9437: keep-balance doesn't seem to be reducing replication level of blocksResolvedTom Clegg06/30/2016

Actions
Actions #1

Updated by Joshua Randall over 8 years ago

I would suggest also changing the log output to make it clear that in some instances "send trash list" actually means "clear trash list" - I did notice that trash lists were being sent at the beginning of each run following the first, but it did not occur to me that they would not be just reiterating the trash lists that were already in place.

What was the purpose of starting a run with clearing the trash list? My guess would be that to be safe, you wanted to ensure that the index returned from each keep server represents a complete set of blocks that will be available on that server by the time the pull lists are sent. The risk of not taking any outstanding trash lists into account could be that it might be possible to come up with a trash/pull plan that results in accidentally trashing all the replicas before they can be pulled?

Would a better solution to the problem be to control the inter-run time rather than the run period? i.e. rather than having a timer that fires every RunPeriod interval, you'd start the timer at the end of a run and when it ends the next run would start, thus controlling the amount of time in between the end of one run and the start of the next so that we can make sure there is "enough" time to handle extant pull/trash requests.

Actions #2

Updated by Tom Clegg over 8 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Tom Clegg over 8 years ago

  • Assigned To set to Tom Clegg
Actions #4

Updated by Tom Clegg over 8 years ago

The scenario "clear trash lists" is meant to avoid is something like this:
  • block B has replication=4, desired=2
  • the best positions are srv0 and srv1, so keep-balance tells srv2 and srv3 to delete B (along with many other blocks, so B doesn't disappear right away)
  • keep-balance is upgraded and now uses a new rendezvous algorithm (or an admin changes the UUIDs of the existing keep services)
  • block B has replication=4, desired=2
  • the best positions are srv2 and srv3, so keep-balance tells srv0 and srv1 to delete B

When I proposed "not on the 2nd..Nth iteration" I was only thinking about the "new rendezvous algorithm" case. The "admin changes the UUIDs" case is just as dangerous, though. Admittedly both seem like rare events, but I'm not confident they're rare enough to make data loss acceptable.

In 9456-less-clear-trash I've added the notion of a "rendezvous state": if a server is added or removed, or changes its UUID/host/port, or this is the first run so we don't know the "rendezvous state" used to compute the current trash lists, then we clear the trash lists before indexing. Normally, the state doesn't change from one run to the next, so we can let keepstore work through its previous trash list while we compute the next one.

The more common changes to rendezvous state are harmless (the whole point of rendezvous hashing is that the probe order of the remaining/existing servers is unaffected by removals and additions) so this approach is somewhat over-cautious.

Actions #5

Updated by Tom Clegg over 8 years ago

9456-less-clear-trash @ fd3d8dac

Actions #6

Updated by Tom Clegg over 8 years ago

  • Target version set to 2016-07-20 sprint
  • Story points set to 0.5
Actions #7

Updated by Radhika Chippada over 8 years ago

LGTM at fd3d8dac

Just a couple very minor suggestions.

main.go:

  • SafeRendezvousState => PrevRendezvousState might be clearer? It is not safe as is; it is used to determine if the current state is safe or not based on the previous state.

balance.go:

  • “clearing existing trash lists to avoid racing with deletions based on a different rendezvous order” => This can be a bit confusing on a first run / rerun (when there are no changes to the rendezvous order). Please consider something like: clearing any existing trash lists to avoid potential race conditions due to any changes to …
Actions #8

Updated by Tom Clegg over 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:c07314d641177cfcecf4321a7b7e6771702f5916.

Actions

Also available in: Atom PDF