Story #2853
closed[Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs
Added by Peter Amstutz over 10 years ago. Updated about 10 years ago.
100%
Description
See: http://en.wikipedia.org/wiki/Rendezvous_hashing
Some thoughts here from earlier technical discussions:
- Rendezvous hashing makes it harder to perform weighted block distribution (we may be able to get around this by adding virtual storage nodes). Currently we do not expect to want weighted block distribution. Rather, we want distribution to be equal in steady state, and we want to minimize excess probes during rebalancing.
- Computing N hashes per block transaction (one for each keep node or virtual node) seems acceptable as long as N isn't huge; the more precise you want your weighting to be, the more hashes you have to compute. Using a fast hash function would be helpful here.
- If the map of storage nodes is tree structured, you can get O(log n) performance, e.g. compare group1 to group2, then follow the branch and compare group3 to group4, etc. However, this makes it impossible to add an arbitrary number of storage nodes with equal weight.
Updated by Tom Clegg over 10 years ago
- Target version set to Arvados Future Sprints
Updated by Tom Clegg over 10 years ago
- Subject changed from Keep uses consistent hashing to chose which servers get which blocks in a stable way to [Keep] Use rendezvous hashing to assign blocks to servers
- Story points set to 1.0
Updated by Tom Clegg over 10 years ago
- Subject changed from [Keep] Use rendezvous hashing to assign blocks to servers to [DRAFT] [Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs
Updated by Tom Clegg about 10 years ago
- Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Updated by Tom Clegg about 10 years ago
- Subject changed from [DRAFT] [Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs to [Keep] Use rendezvous hashing to assign blocks to servers in Go and Python SDKs
Updated by Tom Clegg about 10 years ago
2853-rendezvous @ c7b7e4d
Notes- Chose MD5. Faster hash functions are out there (e.g., blake2) but it seems much more important to pick something that every client/language has built-in or readily available.
- Added a "reference set" of probe orders using 4 easy-to-read data blocks (consisting mostly of ascii zeroes) and 16 services with contrived names.
- Added a test for the desired rebalancing behavior: when starting with N storage nodes and adding D more, no block turns out to be further than D places away from its ideal probe order, and the average block moves about D/N places away from its ideal probe order.
- Only the last 15 characters of the service UUID are used for sorting. (This makes it possible to clone an entire instance, which changes the UUID prefix, without doing any rebalancing.)
- Benchmark
- Test that distribution is uniform (although the rebalancing test is pretty convincing)
Updated by Brett Smith about 10 years ago
Compatibility¶
- Go's sort is not guaranteed to be stable. Python's is. Does this matter? It will only come up in the admittedly very unlikely case of a hash collision, and then it will just mean at most one excess request, so I can easily believe the answer is "no," but I wanted to flag the issue for discussion. On the flip side, fixing it is as easy as changing sort.Sort to sort.Stable.
Go¶
- Can we use Md5String in RootSorter.getWeight?
- TestShuffleServiceRoots seems obsoleted by root_sorter_test.go.
- The reference to
../../python/arvados/keep.py
in root_sorter_test.go should be../../python/tests/test_keep_client.py
. - This is probably a noob question, but why did you increase the size of the stub channels in many of the existing tests?
Python¶
- Please don't use hash as an argument/variable name. It masks Python's built-in hash function (which makes funny syntax highlighting).
- KeepClientServiceTestCase also has code to present a mock service list to the KeepClient, and test with it. Would it make sense to merge this and KeepRendezvousWeightTestCase somehow, to reduce code duplication and mucking with KeepClient's internal structures?
- I've kind of idly wondered before about compatibility issues around stuff that we never advertised as part of the API, but was discoverable and looked public, like
KeepClient.service_roots
andKeepClient.shuffled_service_roots
. I think I'm okay with renaming them if you are, but I guess I wanted to take the opportunity to flag the longer-term issue here. (I'm definitely glad that we're marking private variables and methods more consistently.)
Updated by Tom Clegg about 10 years ago
Brett Smith wrote:
- Go's sort is not guaranteed to be stable. Python's is. Does this matter? It will only come up in the admittedly very unlikely case of a hash collision, and then it will just mean at most one excess request, so I can easily believe the answer is "no," but I wanted to flag the issue for discussion. On the flip side, fixing it is as easy as changing sort.Sort to sort.Stable.
I don't think even a stable sort would fully address this, since we aren't careful (in various places) to keep the incoming list of services sorted in the first place. We could use the service uuid as a secondary sort key, of course. But more importantly, I'd say "undefined order" is perfectly acceptable here: it's an extremely rare event, and even when it happens the expected performance impact is very small. Agreed?
- Can we use Md5String in RootSorter.getWeight?
Yes! Fixed.
- TestShuffleServiceRoots seems obsoleted by root_sorter_test.go.
Indeed. Deleted.
- The reference to
../../python/arvados/keep.py
in root_sorter_test.go should be../../python/tests/test_keep_client.py
.
Whoops. Fixed.
- This is probably a noob question, but why did you increase the size of the stub channels in many of the existing tests?
If you write two "I did a failure" messages to a channel with buffer size 1, the second write blocks until someone reads the first message. I found that many of the tests had a small enough channel size that (1) they depended on having no more than X failures before a success where X was determined by "whatever happened when I wrote the test" rather than a maximum number of failures we actually want to protect, and (2) when that dependency wasn't met, they would fail by hanging in deadlock instead of failing an assertion. This was really annoying to debug when I first ran into it, so I thought it would be best to get rid of the probe order sensitivity except where the test's effectiveness genuinely depends on it.
(Surely a mock library could give us something much neater than these channels, but I didn't want to go that far into it here.)
- Please don't use hash as an argument/variable name. It masks Python's built-in hash function (which makes funny syntax highlighting).
Fixed. Now data_hash
.
- KeepClientServiceTestCase also has code to present a mock service list to the KeepClient, and test with it. Would it make sense to merge this and KeepRendezvousWeightTestCase somehow, to reduce code duplication and mucking with KeepClient's internal structures?
Yes, good point. Will rearrange.
- I've kind of idly wondered before about compatibility issues around stuff that we never advertised as part of the API, but was discoverable and looked public, like
KeepClient.service_roots
andKeepClient.shuffled_service_roots
. I think I'm okay with renaming them if you are, but I guess I wanted to take the opportunity to flag the longer-term issue here. (I'm definitely glad that we're marking private variables and methods more consistently.)
Yes, I agree we should be much more careful about accidentally adding public (as in no-leading-underscore) APIs even when they don't come with other signs of being public, like documentation.
In this particular case, we don't have any particular reason to expose them publicly at all, right?
Now at 9ef386f with todo: Rearrange KeepRendezvousWeightTestCase mock.
Updated by Brett Smith about 10 years ago
Tom Clegg wrote:
Brett Smith wrote:
- Go's sort is not guaranteed to be stable. Python's is. Does this matter? It will only come up in the admittedly very unlikely case of a hash collision, and then it will just mean at most one excess request, so I can easily believe the answer is "no," but I wanted to flag the issue for discussion. On the flip side, fixing it is as easy as changing sort.Sort to sort.Stable.
I don't think even a stable sort would fully address this, since we aren't careful (in various places) to keep the incoming list of services sorted in the first place. We could use the service uuid as a secondary sort key, of course. But more importantly, I'd say "undefined order" is perfectly acceptable here: it's an extremely rare event, and even when it happens the expected performance impact is very small. Agreed?
You're right that stable doesn't help. And yes, I agree that the impact is minimal enough to be unconcerned.
In this particular case, we don't have any particular reason to expose them publicly at all, right?
No. If anything makes sense as a public interface here, it's weighted_service_roots. Everything else makes sense as private.
cf1db398 is good to merge, thanks.
Updated by Anonymous about 10 years ago
- Status changed from In Progress to Resolved
- % Done changed from 67 to 100
Applied in changeset arvados|commit:06e402b11ad4d503feb5fa45845cb27c93478cfc.
Updated by Tom Clegg about 10 years ago
def test_probe_order_visual(self):
hashes = [
hashlib.md5("{:064x}".format(x)).hexdigest()
for x in range(8)]
for nsvc in range(2,16):
sys.stderr.write("\n")
for i, hash in enumerate(hashes):
api_client = self.mock_n_keep_disks(nsvc)
keep_client = arvados.KeepClient(api_client=api_client)
roots = keep_client.weighted_service_roots(hash)
got_order = [
re.search(r'//\[?keep0x([0-9a-f]+)', root).group(1)
for root in roots]
sys.stderr.write("{:20s}".format(''.join(got_order)))
sys.stderr.write("\n")
10 01 01 10 01 01 10 10 210 021 021 102 021 201 120 120 3210 0213 0231 1023 0231 2013 1230 1320 32104 02413 40231 10234 02341 20413 12304 13240 325104 052413 540231 102354 023541 520413 512304 132405 3256104 0526413 5402361 1026354 6023541 5204163 5123064 1326405 32561074 07526413 54023761 10276354 60723541 52704163 51230647 13276405 325681074 075264813 540238761 810276354 607235418 527804163 512306487 138276405 3259681074 0975264813 5402387691 9810276354 6079235418 5279804163 5912306487 1382764095 3a259681074 097a5264813 540238a7691 9810276a354 6079235418a 527a9804163 591230a6487 1a382764095 3ab259681074 097ba5264813 5b40238a7691 9810276a3b54 60792354b18a 527a9804b163 5912b30a6487 1a38276b4095 3ab25c9681074 097ba526481c3 c5b40238a7691 981c0276a3b54 60792c354b18a 5c27a9804b163 591c2b30a6487 1a38276b40c95 3ab2d5c9681074 097dba526481c3 c5b40238a7d691 9d81c0276a3b54 60792c354b18da 5c2d7a9804b163 d591c2b30a6487 1a3827d6b40c95 3eab2d5c9681074 097dba52e6481c3 c5b4e0238a7d691 9d81c02e76a3b54 60792c354be18da 5c2ed7a9804b163 d591c2be30a6487 1a3827d6b40ce95