Keep server » History » Version 10
Tom Clegg, 04/08/2014 12:59 PM
| 1 | 7 | Tom Clegg | h1. Keep server |
|---|---|---|---|
| 2 | 1 | Tom Clegg | |
| 3 | 7 | Tom Clegg | This page describes the Keep backing store server component, keepd. |
| 4 | 1 | Tom Clegg | |
| 5 | {{toc}} |
||
| 6 | |||
| 7 | See also: |
||
| 8 | 7 | Tom Clegg | * [[Keep]] (overview, design goals, client/server responsibilities, intro to content addressing) |
| 9 | 5 | Tim Pierce | * [[Keep manifest format]] |
| 10 | * [[Keep index]] |
||
| 11 | * source:services/keep (implementation: in progress) |
||
| 12 | |||
| 13 | 4 | Tom Clegg | h2. Todo |
| 14 | |||
| 15 | 5 | Tim Pierce | * Implement server daemon (*in progress*) |
| 16 | * Implement integration test suite (*in progress*) |
||
| 17 | 4 | Tom Clegg | * Spec public/private key format and deployment mechanism |
| 18 | * Spec permission signature format |
||
| 19 | * Spec event-reporting API |
||
| 20 | * Spec quota mechanism |
||
| 21 | |||
| 22 | 2 | Tom Clegg | h2. Responsibilities |
| 23 | |||
| 24 | * Read and write blobs on disk |
||
| 25 | 8 | Tom Clegg | * Remember when each blob was last written[1] |
| 26 | 2 | Tom Clegg | * Enforce maximum blob size |
| 27 | * Enforce key=hash(value) during read and write |
||
| 28 | * Enforce permissions when reading data (according to permissions on Collections in the metadata DB) |
||
| 29 | * Enforce usage quota when writing data |
||
| 30 | * Delete blobs (only when requested by data manager!) |
||
| 31 | 1 | Tom Clegg | * Report read/write/exception events |
| 32 | 8 | Tom Clegg | * Report used & free space |
| 33 | 1 | Tom Clegg | * Report hardware status (SMART) |
| 34 | 8 | Tom Clegg | * Report list of blobs on disk (hash, size, time last stored) |
| 35 | |||
| 36 | fn1. This helps with garbage collection. Re-writing an already-stored blob should push it to the back of the garbage collection queue. Ordering garbage collection this way provides a fair and more or less predictable interval between write (from the client's perspective) and earliest potential deletion. |
||
| 37 | 2 | Tom Clegg | |
| 38 | h2. Other parties |
||
| 39 | |||
| 40 | * Client distributes data across the available Keep servers (using the content hash) |
||
| 41 | * Client attains initial replication level when writing blobs (by writing to multiple Keep servers) |
||
| 42 | * Data manager decides which blobs to delete (e.g., garbage collection, rebalancing) |
||
| 43 | |||
| 44 | 1 | Tom Clegg | h2. Discovering Keep server URIs |
| 45 | |||
| 46 | * @GET https://endpoint/arvados/v1/keep_disks@ |
||
| 47 | * see http://doc.arvados.org/api/schema/KeepDisk.html |
||
| 48 | * Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.) |
||
| 49 | |||
| 50 | h2. Supported methods |
||
| 51 | |||
| 52 | For storage clients |
||
| 53 | * GET /hash |
||
| 54 | * GET /hash?checksum=true → verify checksum before sending |
||
| 55 | * POST / (body=content) → hash |
||
| 56 | * PUT /hash (body=content) → hash |
||
| 57 | * HEAD /hash → does it exist here? |
||
| 58 | * HEAD /hash?checksum=true → read the data and verify checksum |
||
| 59 | |||
| 60 | For system (monitoring, indexing, garbage collection) |
||
| 61 | * DELETE /hash → delete all copies of this blob (requires privileged token!) |
||
| 62 | 9 | Tom Clegg | * GET /index.txt → get full list of blocks stored here, including size and timestamp of most recent PUT (requires privileged token) |
| 63 | 1 | Tom Clegg | * GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token) |
| 64 | 9 | Tom Clegg | |
| 65 | Example index.txt: |
||
| 66 | |||
| 67 | <pre> |
||
| 68 | 37b51d194a7513e45b56f6524f2d51f2+3 1396976219 |
||
| 69 | acbd18db4cc2f85cedef654fccc4a4d8+3 1396976187 |
||
| 70 | </pre> |
||
| 71 | |||
| 72 | Example status.json: |
||
| 73 | |||
| 74 | <pre><code class="javascript"> |
||
| 75 | { |
||
| 76 | "volumes":[ |
||
| 77 | {"mount_point":"/data/disk0","bytes_free":4882337792,"bytes_used":5149708288}, |
||
| 78 | {"mount_point":"/data/disk1","bytes_free":39614472192,"bytes_used":3314229248} |
||
| 79 | ] |
||
| 80 | } |
||
| 81 | </code></pre> |
||
| 82 | 1 | Tom Clegg | |
| 83 | h2. Authentication |
||
| 84 | |||
| 85 | * Client provides API token in Authorization header |
||
| 86 | * Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1) |
||
| 87 | |||
| 88 | h2. Permission |
||
| 89 | |||
| 90 | A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block. |
||
| 91 | |||
| 92 | 10 | Tom Clegg | The controller and each Keep server has a private key. |
| 93 | 1 | Tom Clegg | |
| 94 | Writing: |
||
| 95 | * If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator. |
||
| 96 | * The API server @collections.create@ method verifies signatures before giving the current user can_read permission on the collection. |
||
| 97 | 4 | Tom Clegg | * A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling @collections.create@ followed by @collections.get@, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. *(Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)* |
| 98 | 1 | Tom Clegg | |
| 99 | Reading: |
||
| 100 | * The API server @collections.get@ method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a @+Asignature@expirytime@ portion on each blob locator. |
||
| 101 | * Keep server verifies signatures before honoring @GET@ requests. |
||
| 102 | * The signature might come from either the Keep node itself, a different Keep node, or the API server. |
||
| 103 | * A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via @collections.get@. |