Keep server » History » Version 1
Tom Clegg, 02/04/2014 01:40 AM
| 1 | 1 | Tom Clegg | h1. Keep server |
|---|---|---|---|
| 2 | |||
| 3 | This page describes the Keep backing store server component, keepd. |
||
| 4 | |||
| 5 | {{toc}} |
||
| 6 | |||
| 7 | See also: |
||
| 8 | * [[Keep manifest format]] |
||
| 9 | * [[Keep index]] |
||
| 10 | * source:services/keepd (implementation: imminent) |
||
| 11 | |||
| 12 | h2. Discovering Keep server URIs |
||
| 13 | |||
| 14 | * @GET https://endpoint/arvados/v1/keep_disks@ |
||
| 15 | * see http://doc.arvados.org/api/schema/KeepDisk.html |
||
| 16 | * Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.) |
||
| 17 | |||
| 18 | h2. Supported methods |
||
| 19 | |||
| 20 | For storage clients |
||
| 21 | * GET /hash |
||
| 22 | * GET /hash?checksum=true → verify checksum before sending |
||
| 23 | * POST / (body=content) → hash |
||
| 24 | * PUT /hash (body=content) → hash |
||
| 25 | * HEAD /hash → does it exist here? |
||
| 26 | * HEAD /hash?checksum=true → read the data and verify checksum |
||
| 27 | |||
| 28 | For system (monitoring, indexing, garbage collection) |
||
| 29 | * DELETE /hash → delete all copies of this blob (requires privileged token!) |
||
| 30 | * GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token?) |
||
| 31 | * GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token?) |
||
| 32 | |||
| 33 | h2. Authentication |
||
| 34 | |||
| 35 | * Client provides API token in Authorization header |
||
| 36 | * Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1) |
||
| 37 | |||
| 38 | h2. Permission |
||
| 39 | |||
| 40 | A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block. |
||
| 41 | |||
| 42 | The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures). |
||
| 43 | |||
| 44 | Writing: |
||
| 45 | * If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator. |
||
| 46 | * The API server @collections.create@ method verifies signatures before giving the current user can_read permission on the collection. |
||
| 47 | * A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling @collections.create@ followed by @collections.get@, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection should take care of them eventually; the only odd side effect is the existence of partial manifests. *(Should there just be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)* |
||
| 48 | |||
| 49 | Reading: |
||
| 50 | * The API server @collections.get@ method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a @+Asignature@expirytime@ portion on each blob locator. |
||
| 51 | * Keep server verifies signatures before honoring @GET@ requests. |
||
| 52 | * The signature might come from either the Keep node itself, a different Keep node, or the API server. |
||
| 53 | * A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via @collections.get@. |