Keep manifest format » History » Version 8
Tom Clegg, 11/13/2018 09:35 PM
| 1 | 3 | Tom Clegg | {{toc}} |
|---|---|---|---|
| 2 | |||
| 3 | 1 | Tom Clegg | h1. Keep manifest format |
| 4 | |||
| 5 | h2. Manifest v1 |
||
| 6 | |||
| 7 | 6 | Tom Clegg | A manifest is utf-8 encoded text, consisting of zero or more newline-terminated streams. |
| 8 | 1 | Tom Clegg | |
| 9 | Each stream consists of three or more space-delimited tokens: |
||
| 10 | 5 | Tom Clegg | * The first token is a stream name, consisting of one or more path components, delimited by @"/"@. |
| 11 | ** The first path component is always @"."@. |
||
| 12 | ** No path component is empty. |
||
| 13 | 8 | Tom Clegg | ** No path component is "." or ".." (except the leading "."). |
| 14 | 5 | Tom Clegg | ** The stream name never begins or ends with @"/"@. |
| 15 | 7 | Tom Clegg | * The second token is a data blob locator (see [[Keep locator format]]). |
| 16 | 1 | Tom Clegg | * ...possibly followed by more data blob locators... |
| 17 | * The first token that is not a block locator, and all subsequent tokens, are file tokens. |
||
| 18 | ** A file token has three parts, delimited by @":"@: position, size, filename. |
||
| 19 | ** Position and size are given in decimal, and are counted from the beginning of the first data blob. |
||
| 20 | ** Filename may contain @"/"@ characters, but must not start or end with @"/"@, and must not contain @"//"@. |
||
| 21 | ** Filename components (delimited by @"/"@) must not be @"."@ or @".."@. |
||
| 22 | 8 | Tom Clegg | ** Except: Filename may be @"."@ if size is 0. This does not represent a real file; it is a placeholder used to ensure there is at least one file token in a stream that contains no files. |
| 23 | 5 | Tom Clegg | |
| 24 | 1 | Tom Clegg | A manifest contains no TAB characters, nor other ASCII whitespace characters other than the spaces or newline delimiters specified above. |
| 25 | |||
| 26 | 8 | Tom Clegg | Whitespace, backslashes, and special characters appearing in paths and filenames are encoded as @\nnn@ where @nnn@ is a three-digit octal byte code. |
| 27 | * A backslash character is encoded as @\134@. |
||
| 28 | * A space is encoded as @\040@. |
||
| 29 | * It is permitted to escape printable characters: @"fo\157\057bar"@ and @"foo/bar"@ are equivalent. |
||
| 30 | |||
| 31 | 1 | Tom Clegg | A manifest always ends with a newline -- except the empty (zero-length) string, which is a valid manifest. |
| 32 | 8 | Tom Clegg | |
| 33 | 1 | Tom Clegg | |
| 34 | h2. Normalized manifest v1 |
||
| 35 | |||
| 36 | A normalized manifest has the following additional restrictions. |
||
| 37 | * Streams are in alphanumeric order. |
||
| 38 | * Each stream name is unique within the manifest. |
||
| 39 | * Files within a stream are in alphanumeric order. |
||
| 40 | * -Concatenation @stream_name/filename@ is unique within the manifest.- (This can be impossible to accomplish without rewriting blobs.) |
||
| 41 | * Filename must not contain @"/"@. |
||
| 42 | |||
| 43 | An API call -exists- will exist to normalize a manifest. |
||
| 44 | |||
| 45 | Request: |
||
| 46 | * @POST /arvados/v1/collections/{hash}/normalize@ |
||
| 47 | * request body: @{"collection":{"manifest_text":"...."}}@ |
||
| 48 | |||
| 49 | Response: |
||
| 50 | * @{"uuid":"...","manifest_text":"..."}@ |
||
| 51 | |||
| 52 | Notes: |
||
| 53 | * POST despite no side effects. |
||
| 54 | * Returns object with uuid even though no object was stored. |
||
| 55 | 3 | Tom Clegg | |
| 56 | h2. Manifest v2 |
||
| 57 | |||
| 58 | (Early design stages) |
||
| 59 | |||
| 60 | Should probably include: |
||
| 61 | * Structured format (JSON?) |
||
| 62 | * More than one level of indirection (e.g., manifest references block X, which references data blocks A,B,C) |
||
| 63 | * Specify hash algorithm with block hashes |