Bug #15928
closedDeadlock in crunch-run tests
Description
TestSuite.TestStderrMount hangs (possibly the first test to write multiple outputs?)
Almost certainly introduced in #15910
Example: https://ci.curoverse.com/job/developer-run-tests-remainder/1756/console
Updated by Tom Clegg over 6 years ago
goroutine 692 [chan receive, 1 minutes]:
git.arvados.org/arvados.git/sdk/go/arvados.(*filenode).waitPrune(0xc0003ca500)
/home/tom/arvados/sdk/go/arvados/fs_collection.go:606 +0xf8
git.arvados.org/arvados.git/sdk/go/arvados.(*dirnode).marshalManifest(0xc000468ab0, 0xdc18a0, 0xc00004b880, 0xc00081925c, 0x3, 0x0, 0x0, 0x0, 0x0)
/home/tom/arvados/sdk/go/arvados/fs_collection.go:903 +0x987
git.arvados.org/arvados.git/sdk/go/arvados.(*dirnode).marshalManifest.func1(0xc0004cf778, 0x0)
/home/tom/arvados/sdk/go/arvados/fs_collection.go:928 +0x147
git.arvados.org/arvados.git/sdk/go/arvados.(*contextGroup).Go.func1(0xc00004b8c0, 0xc00076a780)
/home/tom/arvados/sdk/go/arvados/contextgroup.go:69 +0x7c
created by git.arvados.org/arvados.git/sdk/go/arvados.(*contextGroup).Go
/home/tom/arvados/sdk/go/arvados/contextgroup.go:67 +0xca
goroutine 634 [semacquire, 1 minutes]:
sync.runtime_Semacquire(0xc00004b8e0)
/usr/local/go/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc00004b8d8)
/usr/local/go/src/sync/waitgroup.go:130 +0x64
git.arvados.org/arvados.git/sdk/go/arvados.(*contextGroup).Wait(0xc00004b8c0, 0x0, 0x0)
/home/tom/arvados/sdk/go/arvados/contextgroup.go:88 +0x3d
git.arvados.org/arvados.git/sdk/go/arvados.(*dirnode).marshalManifest(0xc000468630, 0xdc18e0, 0xc0000c4018, 0xca46e5, 0x1, 0x0, 0x0, 0x0, 0x0)
/home/tom/arvados/sdk/go/arvados/fs_collection.go:994 +0x5f4
git.arvados.org/arvados.git/sdk/go/arvados.(*collectionFileSystem).MarshalManifest(0xc00004b300, 0xca46e5, 0x1, 0x0, 0x0, 0x0, 0x0)
/home/tom/arvados/sdk/go/arvados/fs_collection.go:189 +0xf2
git.arvados.org/arvados.git/services/crunch-run.(*copier).Copy(0xc0006cd4b8, 0x0, 0xdadb00, 0xcdbc98, 0x0)
/home/tom/arvados/services/crunch-run/copier.go:108 +0x58d
git.arvados.org/arvados.git/services/crunch-run.(*ContainerRunner).CaptureOutput(0xc0007de900, 0x0, 0x0)
/home/tom/arvados/services/crunch-run/crunchrun.go:1294 +0x19e
git.arvados.org/arvados.git/services/crunch-run.(*ContainerRunner).Run.func2(0xc0007de900, 0xc000227820)
/home/tom/arvados/services/crunch-run/crunchrun.go:1591 +0x113
git.arvados.org/arvados.git/services/crunch-run.(*ContainerRunner).Run(0xc0007de900, 0x0, 0x0)
/home/tom/arvados/services/crunch-run/crunchrun.go:1665 +0x93a
git.arvados.org/arvados.git/services/crunch-run.(*TestSuite).fullRunHelper(0xc0002f1c40, 0xc0008a82d0, 0xcd1fbe, 0x1b6, 0x0, 0x0, 0x0, 0x1, 0xcd9de8, 0xc000791520, ...)
/home/tom/arvados/services/crunch-run/crunchrun_test.go:822 +0x899
git.arvados.org/arvados.git/services/crunch-run.(*TestSuite).TestStderrMount(0xc0002f1c40, 0xc0008a82d0)
/home/tom/arvados/services/crunch-run/crunchrun_test.go:1956 +0x88
reflect.Value.call(0xc9bf00, 0xc0002f1c40, 0x7e13, 0xca4c8c, 0x4, 0xc00051af30, 0x1, 0x1, 0x15080a0, 0xc00051add0, ...)
/usr/local/go/src/reflect/value.go:460 +0x5f6
reflect.Value.Call(0xc9bf00, 0xc0002f1c40, 0x7e13, 0xc00051af30, 0x1, 0x1, 0xc0006271d0, 0x0, 0x0)
/usr/local/go/src/reflect/value.go:321 +0xb4
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1(0xc0008a82d0)
/home/tom/arvados/tmp/GOPATH/pkg/mod/gopkg.in/check.v1@v1.0.0-20161208181325-20d25e280405/check.go:772 +0x5e4
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1(0xc0002ef680, 0xc0008a82d0, 0xc0000d3340)
/home/tom/arvados/tmp/GOPATH/pkg/mod/gopkg.in/check.v1@v1.0.0-20161208181325-20d25e280405/check.go:666 +0xa2
created by gopkg.in/check%2ev1.(*suiteRunner).forkCall
/home/tom/arvados/tmp/GOPATH/pkg/mod/gopkg.in/check.v1@v1.0.0-20161208181325-20d25e280405/check.go:663 +0x1fb
goroutine 691 [semacquire, 1 minutes]:
sync.runtime_SemacquireMutex(0xc000468b1c, 0xc00065bc00, 0x1)
/usr/local/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc000468b18)
/usr/local/go/src/sync/mutex.go:138 +0xfc
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:81
sync.(*RWMutex).Lock(0xc000468b18)
/usr/local/go/src/sync/rwmutex.go:98 +0x97
git.arvados.org/arvados.git/sdk/go/arvados.(*dirnode).commitBlock.func1(0xc00034bb60, 0xc00076a6c0, 0xc000468ab0, 0xc000173400, 0x6, 0x400, 0xc0004cf600, 0xc00082e160, 0x1, 0x1, ...)
/home/tom/arvados/sdk/go/arvados/fs_collection.go:706 +0x5ee
created by git.arvados.org/arvados.git/sdk/go/arvados.(*dirnode).commitBlock
/home/tom/arvados/sdk/go/arvados/fs_collection.go:698 +0x3ff
Updated by Tom Clegg over 6 years ago
Updated by Peter Amstutz over 6 years ago
Tom Clegg wrote:
15928-fs-deadlock @ 4257184a0fd276af7e1741dda8a7468a30b4a9c6 -- https://ci.curoverse.com/view/Developer/job/developer-run-tests/1694/
If I'm understanding correctly, the rationale is that even if the file has been deleted, renamed or moved, it is safe to replace the memSegment with a storedSegment, so tracking possible renames is counterproductive.
LGTM, but having trouble running the test in arvbox (for unrelated reason the API server won't start, see gitter).
Updated by Tom Clegg over 6 years ago
- Status changed from In Progress to Resolved
Peter Amstutz wrote:
If I'm understanding correctly, the rationale is that even if the file has been deleted, renamed or moved, it is safe to replace the memSegment with a storedSegment, so tracking possible renames is counterproductive.
Yes. The previous code avoided deadlock using the "lock all files in this dir, sorted by name" strategy, which can't lock (and therefore can't update) files that are no longer in the same dir. The new code avoids deadlock by locking only one file at a time instead, so the file's name and parent dir are irrelevant.
Updated by Tom Clegg over 6 years ago
- Related to Bug #15946: [crunch-run] [collectionfs] Deadlock while writing output collection added