Version 4 - History - Keep-balance - Arvados

3

Tom Clegg

The Data Manager enforces policies and generates reports about storage resource usage. The Data Manager interacts with the [[Keep server]] and the metadata database. Clients/users do not interact with the Data Manager directly.

5

4

Tom Clegg

See also:

6

* [[Keep server]]

7

* [[Keep manifest format]]

8

* source: n/a (design phase)

10

4

Tom Clegg

Responsibilities:

11

* Garbage collector: decide what is eligible for deletion (and some partial order of preference)

12

* Replication enforcer: copy and delete blobs in various backing stores to achieve desired replication level

13

* Rebalancer: move blobs to redistribute free space and reduce client probes

14

* Tell managers how much disk space is being conserved due to CAS

15

* Tell managers how much disk space is occupied in a given backing store service

16

* Tell managers how disk usage would be affected by modifying storage policy

17

* Tell managers how much disk space+time is used (per user, group, node, disk)

18

* Tell users when replication/policy specified for a collection is not currently satisfied (and why, for how long, etc)

19

* Tell users how much disk space is represented by a given set of collections

20

* Tell users how much disk space can be made available by garbage collection

21

* Tell users how soon they should expect their cached data to disappear

22

* Tell users performance statistics (how fast should I expect my job to read data?)

23

* Tell ops where each block was most recently read/written, in case data recovery is needed

24

* Tell ops how unbalanced the backing stores are across the cluster

25

* Tell ops activity level and performance statistics

26

* Tell ops activity level vs. amount of space (how much of the data is being accessed by users?)

27

* Tell ops disk performance/error/status trends (and SMART reports) to help identify bad hardware

28

* Tell ops history of disk adds, removals, moves

29

30

Basic kinds of data in the index:

31

* Which blocks are used by which collections (and which collections are valued by which users/groups)

32

* Which blocks are stored on which disks

33

* Which disks are attached to which nodes

34

* Read events

35

* Write events

36

* Exceptions (checksum mismatch, IO error)

37

38

h2. Implementation considerations

39

40

Overview

41

* REST service

42

* API server may cache/proxy some queries

43

* API server may redirect some queries

44

45

Permissions

46

* Support +A tokens like [[Keep server]] when accepting collection/blob uuids in request?

47

* Require admin api_token for some queries, site-configurable?

48

49

Distributed/asynchronous

50

* Easy to run multiple keep index services.

51

* Most features do not need synchronous operation / real time data.

52

* Features that move or delete data should be tied to a single "primary" indexing service (failover event likely requires resetting some state).

53

* Substantial disagreement between multiple index services should be easy to flag on admin dashboard.

Project

General

Profile

Arvados

Keep-balance » History » Version 4