Story #7393
closed[Keep] Prototype S3 blob storage
100%
Description
The prototype should implement the Keep volume interface using S3 blob storage, including returning errors that are required to report problems in the underlying storage.
The prototype does not need to deal with non-essential errors like configuration problems, temporary network hiccups, etc.
Ideally the prototype will be developed in such a way there's a clear path for further development can make it production-ready. However, in case of doubt or conflict, getting the prototype done in a timely manner to prove the concept overrides this concern.
The branch review should ensure that the prototype meets functionality requirements, and can meet known scalability requirements in production use. It doesn't need to address code style, issues with tests (although ideas for tests are good to capture), etc.
Make sure the implementation can accommodate S3-compatible endpoints other than Amazon S3 proper. But it's OK if, in the first implementation, only Amazon S3 is supported/tested.
Refshttps://godoc.org/gopkg.in/amz.v2/s3https://github.com/AdRoll/goamz/s3 (looks better than stock AWS SDK)- http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html
Updated by Brett Smith about 9 years ago
- Target version set to 2015-12-02 sprint
Updated by Brett Smith about 9 years ago
- Target version changed from 2015-12-02 sprint to Arvados Future Sprints
Updated by Tom Clegg about 9 years ago
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2015-12-16 sprint
Updated by Tom Clegg about 9 years ago
7393-s3-volume is at 069704e with the following known issues that (I think) we can merge with:
Delete-vs.-write race¶
The delete-vs.-write race is not handled. It is possible to write (refresh) an existing block between the time "T0" when the delete handler confirms that the block is old and the time "T1" when the block actually gets deleted. When this happens, PUT reports success even though the block gets deleted right away.
(Aside: AWS does not guarantee the block actually becomes ungettable before "delete" returns, so "T1" can be even later than when keepstore finishes its delete method.)
Current workarounds:- If you want to be safe and don't mind not having garbage collection, you're fine; delete is disabled by default.
- If you want to do garbage collection and you aren't worried about the race, turn on
-s3-unsafe-delete
.
Odd error messages¶
AWS reports "access denied" instead of 404 when trying to read a nonexistent block during Compare and Get.2015/12/09 20:56:07 s3-bucket:"4xphq-keep": Compare(637821cc1c31b89272a25c1a6885cc8e): Access Denied
This might just be a problem in the way we've set up our test bucket permissions, though. The s3test stub server throws 404 as expected so we pass the "report notfound as ErrNotExist" tests.
No docs¶
...other than the keepstore -help
message.
Non-Amazon endpoints are untested¶
The options are there (-s3-endpoint
) for using a non-AWS S3-compatible service like Google Storage, but the only services I've tried it on are AWS and the s3test server from https://github.com/AdRoll/goamz.
Updated by Peter Amstutz about 9 years ago
Tom Clegg wrote:
7393-s3-volume is at 069704e with the following known issues that (I think) we can merge with:
Delete-vs.-write race¶
The delete-vs.-write race is not handled. It is possible to write (refresh) an existing block between the time "T0" when the delete handler confirms that the block is old and the time "T1" when the block actually gets deleted. When this happens, PUT reports success even though the block gets deleted right away.
(Aside: AWS does not guarantee the block actually becomes ungettable before "delete" returns, so "T1" can be even later than when keepstore finishes its delete method.)
Current workarounds:
- If you want to be safe and don't mind not having garbage collection, you're fine; delete is disabled by default.
- If you want to do garbage collection and you aren't worried about the race, turn on
-s3-unsafe-delete
.
I spent a while reading the S3 documentation. The correct way to do this seems to be to enable versioning on the bucket. Then the head-and-delete operation will only delete the specific version of the object. This should solve the race because if there is a PUT or PUT-copy it will show up as a more recent version. As a side effect the "PUT-copy" operation used for Touch() may need to explicitly delete the old version.
Updated by Peter Amstutz about 9 years ago
Object versioning in S3 compatible APIs:
Google:
Has a "generation" parameter that is very similar to Amazon's "versionId", except that it's a 64 bit integer where S3 uses a string.
https://cloud.google.com/storage/docs/object-versioning?hl=en
Ceph:
"x-amz-version-id" is listed under "Unsupported header fields" and no mention of versioning in the documentation.
Updated by Peter Amstutz about 9 years ago
s3_volume_test has some commented out code.
Updated by Peter Amstutz about 9 years ago
(01:43:43 PM) Walex: gah, I actually came back before I forget, to say something obvious but that may be useful: the "standard" way to avoid this problem in distributed filesystems is to allow data operations to be done by any "keepstore", but to get all metadata operations to be done only by one "keepstore", e.g. the one "with the lowest IP address" as in AFS, or the one that managed first to acquire a certain "well known" lock. You could use that for Ceph but not the other syste (01:48:33 PM) tetron_: Walex: actually, that's a great idea (01:48:40 PM) tetron_: Walex: you're probably gone now (01:48:58 PM) tetron_: Walex: but yea, we could have 1 writable server and N read-only servers
Would require some locking between between trash list and PUT handler in keepstore itself (maybe a another story).
Updated by Peter Amstutz about 9 years ago
One detail to check:
The S3 documentation for PUT-copy specifies:
x-amz-copy-source: /source_bucket/sourceObject
However the code constructs this string:
v.Bucket.Name+"/"+loc
Is the first '/' being added somewhere, or is S3 accepting it without the leading slash?
Updated by Tom Clegg about 9 years ago
Peter Amstutz wrote:
Is the first '/' being added somewhere, or is S3 accepting it without the leading slash?
Interesting. The goamz s3 package leaves out the leading '/', s3test doesn't tolerate one, and amazon seems to add it implicitly if you leave it off (keep-exercise did lots of "touch" operations without any trouble)... I'd say this should be fixed in the SDK first, and then (depending on how the SDK fixes it) we should update our code.
s3_volume_test has some commented out code.
Whoops, removed. Thanks.
Updated by Tom Clegg about 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:7d5d57a522489209e6b3cecfef94bab0aae4a7f5.