Feature #8554
closed[Keep] Implement trash/untrash behavior in volume_unix
Added by Peter Amstutz about 9 years ago. Updated almost 9 years ago.
100%
Description
Implementation¶
Trash:- If trash-ttl > 0:
- set deadline = now + trash-ttl
- Rename abc/{hash} to abc/{hash}.trash.{deadline}
- If trash-ttl == 0:
- Just delete abc/{hash} now (i.e., same behavior we have now)
- Rename the first file named abc/{hash}.trash.* (there might be more than one) to abc/{hash}
- Walk hierarchy looking for {hash}.trash.*
- If deadline < now, delete the file
- Log amount of deleted stuff: number of blocks, number of bytes
- Log amount of undeleted stuff still in trash: number of blocks, number of bytes
- Use one time.Ticker with configurable trash-check interval (default 1 day?)
- When ticker fires, empty trash on all volumes
Updated by Brett Smith about 9 years ago
- Target version set to Arvados Future Sprints
Updated by Radhika Chippada about 9 years ago
- Category set to Keep
- Assigned To set to Radhika Chippada
- Target version changed from Arvados Future Sprints to 2016-03-16 sprint
Updated by Radhika Chippada about 9 years ago
- Status changed from New to In Progress
Updated by Tom Clegg almost 9 years ago
(from irc) Duration flags should be DurationVar not IntVar
The doneEmptyingTrash line here should move down two lines, so it stops the trash collector after a signal is received, rather than during program startup...
go func(sig <-chan os.Signal) {
+ doneEmptyingTrash <- true
s := <-sig
log.Println("caught signal:", s)
listener.Close()
It would be better (and easier to test reliably) to pass the doneEmptyingTrash
channel to the emptyTrash()
func explicitly rather than using a global.
I think this should break if err == nil
, not if err != nil
. (With the current code, we give up as soon as we fail to untrash a block. Shouldn't we stop as soon as we succeed in untrashing, and return an error only if none of the trash we found was untrashable?)
for _, f := range files {
if strings.HasPrefix(f.Name(), prefix) {
err = os.Rename(v.blockPath(f.Name()), v.blockPath(loc))
if err != nil {
break
}
}
}
Also in Untrash, if no trashed block was found, we should return a non-nil error that satisfies os.IsNotExist()
.
This regexp should be more careful to only match files that trash could conceivably be used by the trash function. Suggest:
-var trashRegexp = regexp.MustCompile(`.*([0-9a-fA-F]{32}).trash.(\d+)`)
+var trashRegexp = regexp.MustCompile(`/([0-9a-f]{32}).trash.(\d+)$`)
Style: Please un-pyramid the code in (v *UnixVolume) EmptyTrash
; e.g., return nil
after log.Printf
, instead of putting the rest of the func in an else
.
if err != nil { log.Printf("EmptyTrash error for %v: %v", path, err) } else if !info.Mode().IsDir() { matches := trashRegexp.FindStringSubmatch(path) if len(matches) == 3 { ...
Also in (v *UnixVolume) EmptyTrash
-- either it should return a non-nil error if an error is encountered during filepath.Walk() (either passed in to the walk function, or encountered in the walk function), or it should not have a function signature that suggests it is capable of returning errors. The latter is probably more reasonable: the only thing the caller would do differently is log the error, and we already log errors right from EmptyTrash.
(v *UnixVolume) EmptyTrash
should print stats even if it encountered an error (i.e., the stats shouldn't be in an "else")
- put block
- set trashLifetime=time.Hour
- trash block
if err == ErrNotImplemented { return }
(rest of the tests don't apply)- get block →
os.IsNotExist(err)
- empty trash
- untrash block
- get block →
err == nil
- untrash block →
os.IsNotExist(err)
- get block →
err == nil
- set trashLifetime=time.Nanosecond
- trash block
- untrash block →
err == nil
(confirm the data doesn't go away until we call EmptyTrash) - trash block
- empty trash
- untrash block →
os.IsNotExist(err)
- get block →
os.IsNotExist(err)
- put block
- set trashLifetime=time.Nanosecond
- trash block
- put same block again →
err == nil
- empty trash
- get block →
err == nil
(confirm we don't delete an un-trashed block by mistake even if the same block is also in trash) - trash block
- set trashLifetime=time.Hour
- trash block →
err == nil
- empty trash
- untrash block →
err == nil
(confirm we don't delete the unexpired copy, even if the same block is in both expired and unexpired trash)
I think if we change the empty-trash condition from "deadline < now" to "deadline <= now" then a trash lifetime of 1ns should assure us that whatever we put in the trash is eligible for deletion right away (even if we empty trash in the same nanosecond, the unix timestamp only has 1-second precision so a trash lifetime under 1s would effectively mean "eligible for emptying right now").
The existing "delete" tests already test for correct behavior when trashLifetime==0
so we don't need to add more tests for that, right?
Nit: Instead of parsing as int with Atoi and then casting to int64, we could just use strconv.ParseInt here:
-deadline, err := strconv.Atoi(matches[2])
-...
-if int64(deadline) < ...
+deadline, err := strconv.ParseInt(matches[2], 10, 64)
+...
+if deadline < ...
Updated by Radhika Chippada almost 9 years ago
(from irc) Duration flags should be DurationVar not IntVar
Done
The doneEmptyingTrash line here should move down two lines, so it stops the trash collector after a signal is received, rather than during program startup...
Done
It would be better (and easier to test reliably) to pass the doneEmptyingTrash channel to the emptyTrash() func explicitly rather than using a global.
Done
I think this should break if err == nil, not if err != nil.
Good catch. Fixed
and return an error only if none of the trash we found was untrashable?
Done
Also in Untrash, if no trashed block was found, we should return a non-nil error that satisfies os.IsNotExist().
Done
This regexp should be more careful to only match files that trash could conceivably be used by the trash function. Suggest:
Updated
Style: Please un-pyramid the code
Done
Also in (v *UnixVolume) EmptyTrash -- either it should return a non-nil error if an error is encountered ...
Done (changed to not return error)
(v *UnixVolume) EmptyTrash should print stats even if it encountered an error (i.e., the stats shouldn't be in an "else")
Done
Test sequence listed ...
Added test with the listed sequence. (There is one extra trash listed in this sequence: the fifth from the bottom before changing trashLifetime to 1 hour. I omitted this)
The volume tests shouldn't test the emptyTrash goroutine. Instead they should just call EmptyTrash on the volume being tested.
I am uncomfortable with the idea that we do not test the emptyTrash goroutine at all. Instead of removing this (after adding your suggested test sequence), I improved the test with the goroutine to use 1ns for trashLifetime so that it does not waste time during the test. But if you'd rather remove it, I will remove it.
The existing "delete" tests already test for correct behavior when trashLifetime==0 so we don't need to add more tests for that, right?
Deleted it
Nit: Instead of parsing as int with Atoi and then casting to int64, we could just use strconv.ParseInt here:
Done
Updated by Tom Clegg almost 9 years ago
I am uncomfortable with the idea that we do not test the emptyTrash goroutine at all. Instead of removing this (after adding your suggested test sequence), I improved the test with the goroutine to use 1ns for trashLifetime so that it does not waste time during the test. But if you'd rather remove it, I will remove it.
The idea of testing emptyTrash is nice, but having it fire every 1ns while we're running our volume tests doesn't seem like a very effective way to test it. We still have to call EmptyTrash ourselves in order to avoid races, so the goroutine mainly has the effect of making the test flaky:
--- FAIL: TestUnixVolumeWithGenericTests (0.32s) volume_generic_test.go:1007: file does not exist
We could test the emptyTrash goroutine separately by setting up some UnixVolumes, calling Put and Trash, and waiting in a busy loop for the trashed data to disappear. This doesn't belong in the generic volume test suite, though -- we don't need to test it for each volume type/config.
Updated by Radhika Chippada almost 9 years ago
Deleted testTrashUntrashWithEmptyTrashGoroutine
Updated by Tom Clegg almost 9 years ago
Radhika Chippada wrote:
Added test with the listed sequence. (There is one extra trash listed in this sequence: the fifth from the bottom before changing trashLifetime to 1 hour. I omitted this)
I've added that in, with comments. The idea is to put the same data in the trash twice, with different deadlines, and then empty the trash after the first deadline has passed but before the second one arrives. In this case it should still be possible to untrash: i.e., we don't throw out the "new" trash just because we also had identical "old" trash.
Also escaped the "."s in the regexp and moved it next to the function that uses it.
9c571f5 look ok?
Updated by Tom Clegg almost 9 years ago
...and the rest of the un-pyramid thing → 7309fab
Updated by Radhika Chippada almost 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:5ba59abf2fece90309815b15b2d118860fb3b1b3.