Objects as pseudo-blocks in Keep » History » Version 2
Peter Amstutz, 05/28/2024 08:45 PM
| 1 | 1 | Peter Amstutz | h1. Objects as pseudo-blocks in Keep |
|---|---|---|---|
| 2 | |||
| 3 | Idea for accessing external objects via Keep (specifically S3) |
||
| 4 | |||
| 5 | The way we've bounced around for a while has been to take an object, split it into 64 MiB blocks, and record, each block hash in a database along with a reference to the object and offset. |
||
| 6 | |||
| 7 | 2 | Peter Amstutz | Here is a different approach to this idea. (Tom floated a version of this at one of our engineering meetings but I don't think we fully explored it at the time). |
| 8 | 1 | Peter Amstutz | |
| 9 | For an s3 object of 1234 bytes long located at s3://bucket/key |
||
| 10 | |||
| 11 | ffffffffffffffffffffffffffffffff+512+B(base64 encode of s3://bucket/key)+C256 |
||
| 12 | |||
| 13 | 2 | Peter Amstutz | The ffff... indicates it is a special block (we could also use 0000... or 0f0f0f... etc). Another idea would be to use a hash of the size, @+B@ and @+C@ hints. Alternately S3 also offers checksums of files, so we could use the MD5 of the full object. |
| 14 | 1 | Peter Amstutz | |
| 15 | 2 | Peter Amstutz | * It is 512 bytes long. |
| 16 | * The hint @+B@ means data should be fetched from the s3:// URL which is base64 encoded (this is necessary to match our locator syntax). |
||
| 17 | * The hint @+C@ means read from offset 256 bytes. |
||
| 18 | 1 | Peter Amstutz | |
| 19 | Large files can be split, e.g. |
||
| 20 | |||
| 21 | ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C0 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C67108864 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C134217728 |
||
| 22 | |||
| 23 | However this repeats the the +B portion a bunch of times, so we could allow the manifest to describe oversized blocks: |
||
| 24 | |||
| 25 | ffffffffffffffffffffffffffffffff+1000000000+B(base64 encode of s3://bucket/key)+C0 |
||
| 26 | |||
| 27 | Implementation-wise, this would be split into 64 MiB chunks at runtime when the manifest is loaded. The block cache would need to use the full locator (with +B and +C). |
||
| 28 | |||
| 29 | 2 | Peter Amstutz | Add support for locators of this type to Keepstore, which already has code to interact with S3 buckets. This avoids adding such code to the client. |
| 30 | 1 | Peter Amstutz | |
| 31 | Keepstore would need to be able to read the buckets. This could be done either with a blanket policy (allow keepstore/compute nodes to read specific buckets) and/or by adding a feature to store AWS credentials in Arvados in a way such that Keepstore, having the user's API token, is able to fetch them and use them (such as on the API token record). |
||
| 32 | |||
| 33 | 2 | Peter Amstutz | For S3 specifically, if we include @?versionId=@ on all URLs, the blocks can be assumed to be immutable. |
| 34 | |||
| 35 | Advantages |
||
| 36 | |||
| 37 | * This strategy is a lot like how we approach federation. |
||
| 38 | * If locators of this type are supported by Keepstore, then Go and Python SDKs require relatively few changes (they continue to blocks from Keepstore). |
||
| 39 | * Does not require downloading and indexing files |
||
| 40 | |||
| 41 | Disadvantages |
||
| 42 | |||
| 43 | * Can't verify file contents. |