Story #8178
closed[Keepstore] Implement interfaces trash area and undelete endpoint
100%
Description
-trash-lifetime 240h
Rename the delete method of storage volumes to trash.
Add an untrash method to the storage volume interface.
Add an undelete endpoint for Keepstore that's authenticated by the data manager token, and moves a specified block locator out of trash back to main storage using the new untrash method.
If an admin sets a nonzero trash timer using the new configuration knob, and the underlying storage volume does not support the new untrash method, keepstore should refuse to start.
This story does not include implementations for any concrete storage volume.
Updated by Brett Smith about 9 years ago
- Target version set to Arvados Future Sprints
Updated by Brett Smith about 9 years ago
- Description updated (diff)
- Subject changed from [Keepstore] Blocks on the trash list are moved to trash, and actually deleted later, after another timer to [Keepstore] Implement interfaces trash area and undelete endpoint
Updated by Radhika Chippada about 9 years ago
- Status changed from New to In Progress
- Assigned To set to Radhika Chippada
Updated by Brett Smith about 9 years ago
- Target version changed from Arvados Future Sprints to 2016-02-03 Sprint
Updated by Radhika Chippada about 9 years ago
a4d0d55ac333e8060d13a600d8aa8f5443760b29
- Renamed Delete as Trash and updated comment in volume.go
- Added Untrash (with empty implementation for now)
- Added undelete endpoint (UndeleteHandler) and added tests
- Added -trash-lifetime in keepstore.go
- s3 volume adder and azure blob volume adder are update to return error when trash-lifetime <= 0
- Returning error in volume_unix is causing test failures. I tried to send it in using run_test_servers (similar to never-delete), but I am still seeing errors with this. For the time being, I commented out the return error statement. I will continue researching into this; please let me know if you see where else I should be sending this in when keepstore servers are being started.
Updated by Tom Clegg almost 9 years ago
8178-keepstore-trash-interface @ a4d0d55
-trash-lifetime
should be a time.Duration -- this way flag.DurationVar will take care of parsing "240h"
Usage message doesn't need fmt.Sprintf(), it's just a string. Suggest: "Interval after a block is trashed during which it can be recovered using an /untrash request"
Sorry I didn't think/say this earlier, but I think we should call the http route "untrash" rather than "undelete". Currently we accept an HTTP "delete" method, but everywhere else we refer to "trash", and I think it will be less confusing if we reserve the word "delete" for the irreversible thing that happens to blocks that have been in the trash long enough.
UndeleteHandler shouldn't break after succeeding on one volume: if the block is in the trash on several volumes, we should untrash all of them.
If there are no writable volumes, we should respond http.StatusNotFound with "no writable volumes" in the body.
If os.IsNotExist(err) returns true for all errors, we should respond http.StatusNotFound
(We should return 500 only if we get an error other than "doesn't exist"... and I think it would be good form to say http.StatusInternalServerError not 500)
Calling resp.Write() before resp.WriteHeader() causes an implicit resp.WriteHeader(http.StatusOK) -- we should either reverse the order of these calls, or return after writing the body in the 200 case.
The stubbed Untrash methods should return ErrNotImplemented -- not nil, which indicates the block was untrashed.
The Trash() methods should return ErrNotImplemented if trashLifetime != 0.
The volumeAddr Set() methods should return ErrNotImplemented if trashLifetime != 0 (not if trashLifetime <= 0). trashLifetime == 0 is the only supported case right now. The returned/reported error should be something like "trashLifetime > 0 is not supported by this volume type".
Updated by Radhika Chippada almost 9 years ago
Addressed all those review comments at 3b31f110db82c93c3aade294f50bbb0218c74697
Updated by Tom Clegg almost 9 years ago
The real volumes' Trash() methods should return ErrNotImplemented if trashLifetime != 0.
This doesn't seem right. ErrNotImplemented means the block was not untrashed, and should be reported as an error:
if err == nil || err == ErrNotImplemented { log.Printf("Untrashed %v on volume %v", hash, vol.String())
(It looks like the tests should pass without this oddity, since the handler gets tested with MockVolumes, which return nil without actually untrashing anything -- right?)
Style nit: use "else if" or a switch, and follow "handle errors first" Go idiom, so instead ofif err == nil { ... } else { if os.IsNotExist(err) { ... } else { ... } }
...something likeif os.IsNotExist(err) { ... } else if err != nil { ... } else { ... }
It would be valuable to add a generic volume test for the "Trash with non-zero trash time, then Untrash" sequence; for now it would be acceptable to either return ErrNotImplemented from Trash() (in which case Untrash doesn't get tested) or return nil, and then successfully Untrash() the block so Get() works again. I suggest adding this test in its own branch -- IMO it shouldn't prevent merging 8178-keepstore-trash-interface, but we could do it before closing this issue.
Updated by Radhika Chippada almost 9 years ago
The real volumes' Trash() methods should return ErrNotImplemented if trashLifetime != 0.
Done
This doesn't seem right. ErrNotImplemented means the block was not untrashed, and should be reported as an error:
Done. This was a bit unclear from previous comment, but now addressed.
(It looks like the tests should pass without this oddity, since the handler gets tested with MockVolumes, which return nil without actually untrashing anything -- right?)
Updated MockVolume to return nil, instead of ErrNotImplemented
Style nit
Done
Adding test in generic volume test
Yes, I also thought of this. I will add it in a separate branch as you suggested (after merging this in master). Thanks.
Updated by Radhika Chippada almost 9 years ago
Branch 8178-trash-interface-generic-volume-test adds generic volume tests for the new trash / untrash interface at 5a7b7ea65edaf40f5d9c17adc2184b0a96f05a2d
Updated by Tom Clegg almost 9 years ago
Tom Clegg wrote:
It would be valuable to add a generic volume test for the "Trash with non-zero trash time, then Untrash" sequence; for now it would be acceptable to either return ErrNotImplemented from Trash() (in which case Untrash doesn't get tested) or return nil, and then successfully Untrash() the block so Get() works again. I suggest adding this test in its own branch -- IMO it shouldn't prevent merging 8178-keepstore-trash-interface, but we could do it before closing this issue.
More specifics:
testTrashUntrash- Set trashLifetime=3600 for duration of test case
- PutRaw a block, and backdate it
- Trash the block
- If the volume is not writable:
- Trash must return ErrMethodDisabled
- Else if Trash returned a non-nil err:
- err must be ErrNotImplemented
- Else (i.e., Trash returned err==nil):
- Get the block → must fail, os.IsNotExist() must return true for the returned error
- Untrash the block → must succeed
- Then, regardless of how Trash returned:
- Get the block → must succeed
When the implementations catch up, we will disallow the ErrNotImplemented possibility.
I don't think we should duplicate the existing tests for "trash block that's newer than blobSignatureTTL", etc. (If anything, we could add a loop in the existing test (testDeleteNewBlock) so it checks once with trashLifetime=0 and once with trashLifetime=time.Hour. But testTrashUntrashNewBlock at 5a7b7ea for example doesn't seem to test anything we aren't already testing -- except ensuring that Untrash() is not implemented, which is only one of multiple acceptable possibilities that should be tested in combination with Trash() as above.)
Updated by Radhika Chippada almost 9 years ago
Updated the trash/untrash test with the sequence of steps listed above.
I noticed one issue while writing the test. The s3 test setup seems to empty the block when backdated. Due to this Get after a trash / untrash sequence does not pass comparison as expected with the TestBlock. I tweaked the test to pass in this case as well. But, we might want to fix the issue with the s3 test setup for the long term.
Updated by Peter Amstutz almost 9 years ago
- Target version changed from 2016-02-03 Sprint to 2016-02-17 Sprint
Updated by Tom Clegg almost 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 67 to 100
Applied in changeset arvados|commit:3920b0d627ac89687a741b186554c1afd61750f5.