Story #12308
open[FUSE] Golang-based fuse driver
100%
Description
Background:
Python+llfuse was expedient and has done lots of good work for us, but it's not promising as a long term (fast+reliable+maintainable) solution.
Implementation:- collection-backed filesystem from #12483, plus more general arvados-backed filesystem ("by_id" directory, etc, same as the one exported via webdav) from #13111
- present as fuse using a library like https://godoc.org/bazil.org/fuse or https://godoc.org/github.com/billziss-gh/cgofuse/fuse
- package as a subcommand ("mount") of the source:cmd/arvados-client program
- Approach for handling websocket "update" events
- Selectable mechanisms/options for syncing to server (fflush, fsync, close) (on a shell node, flush-on-close, flush-periodically, or flush-after-idle-time might be best; in crunch-run, flush-on-exit might be best)
- Desired behavior when updates conflict (write error? clobber? create "oops,clobbered" file?)
- Old keep block signatures don't get refreshed, so reading a collection that's been cached for too long returns an I/O error
- Not command-line compatible with arv-mount
- Logging is not great
- No docs
- No way to control overall cache size (currently collectionfs can use lots of RAM in certain non-sequential write scenarios; we need the ability to trade speed for space efficiency in memory-constrained environments)
- No warnings given when cache is thrashing
- No application level instrumentation (just optional Go pprof)
- Special
.arvados#collection
file is incomplete (has manifest_text but not uuid, pdh) - No automatic flush on sigint/sigterm
- No warning given when trying to exit but filesystem can't be unmounted yet (filehandle is open, or a process's cwd is in the mount)
- Mac port has a race bug (see notes below)
- Windows port is untested
- Cross-compiling recipe for Mac/Windows ports is fragile
- chmod is a no-op (chmod 0700 succeeds, but the file mode will still be 0755)
Updated by Tom Clegg about 7 years ago
- Status changed from New to In Progress
work in progress here
12308-go-fuse @ aa18bbe2333f293d329efdae4a13ff79b03a1d8c
go get -d git.curoverse.com/arvados.git/lib/crunchstat ;# clone arvados to your gopath (cd $GOPATH/src/git.curoverse.com/arvados.git && git checkout origin/12308-go-fuse) go get git.curoverse.com/arvados.git/cmd/arvados-client # (set up your ARVADOS_API_HOST and ARVADOS_API_TOKEN env vars) $GOPATH/arvados-client mount --experimental /tmp/mnt & cd /tmp/mnt/by_id/$some_existing_collection_uuid/ && git clone file:///home/path/to/arvados.git && sync .
Updated by Tom Clegg about 7 years ago
- Related to Feature #12876: [CLI] arvados-client command-line tool added
Updated by Abram Connelly almost 7 years ago
Using the new arv mount functionality, I created a new keep mount via:
$ mkdir keepgo $ arvados-client mount --experimental keepgo/ -d 2> arv-mount-experimental.log
Going into the keepgo/home
and doing some ls
commands intermittantly gives the following error message:
abram@lightning-dev1:~/keepgo/home$ ls ls: reading directory '.': Input/output error 00-example-shell.cwl input Saved at 2015-05-06 21:13:44 UTC by crunch@01b0dfdb2f15 ...
Where, when the error occurs, it gives a partial list of the directory.
I don't see anything of note in the log so I haven't provided it here.
Updated by Tom Clegg almost 7 years ago
The non-fuse-related code here has been extracted and used in #13111.
Fuse parts are rebased against #13111, now 12308-cgofuse @ c5633c850d664d2f78e0efccf9ec9734b4e32de5.
Updated by Peter Amstutz about 5 years ago
- Target version set to 2020-02-12 Sprint
Updated by Peter Amstutz almost 5 years ago
- Related to Story #16082: Port client tools to Go added
Updated by Peter Amstutz almost 5 years ago
- Assigned To changed from Peter Amstutz to Tom Clegg
Updated by Tom Clegg almost 5 years ago
12308-cgofuse @ 9a4fcabed1adeff0044d419977d5136c5cb1db3e -- developer-run-tests: #1715
Adds a "mount" subcommand to arvados-client, with limitations noted in the issue description above.
Our currently supported platforms (linux/amd64) work fine with the usual build process; source:cmd/arvados-client/Makefile has a recipe for cross-compiling binaries for linux/macos/windows on i386/amd64 using their respective fuse/fuse-like libraries.
Updated by Lucas Di Pentima almost 5 years ago
Gave it a light first pass look, a couple of issues:
- Tests failed on developer-run-tests: #1715
- Getting checksum mismatch when trying to run make on
cmd/arvados-client/
Updated by Tom Clegg almost 5 years ago
Fixed incorrect test assertion that failed build 1715 from #12308#note-14
12308-cgofuse @ e54bbc170b78f3f4c90be7c8b314d58e559cd73c -- developer-run-tests: #1718
Updated by Lucas Di Pentima almost 5 years ago
I have been testing a binary compiled for OSX, using osxfuse
version 3.10.4 and OSX 10.14.6
The test consists on cloning arvados' repository from github.
- Against ce8i5 (5 Mbps uplink) fails crashing the program without debug info
- Using arvbox
- Unbounded virtual network: OK, 30secs
- 50 Mbps up/down: OK, 36 secs
- 25 Mbps up/down: OK, 50 secs
- 12 Mbps up/down: OK, 65 secs
- 5 Mbps up/down: FAILED (with crash and no debug info): 2min14secs, , got to write 43 of the 73 MB before crashing.
Updated by Lucas Di Pentima almost 5 years ago
Repeating the test of cloning arvados' repo, I got one crash with debug information after 6 seconds, the error I got from the 'git' command execution was 'fatal: write error: Device not configured.'
:
fatal error: concurrent map read and map write goroutine 50 [running, locked to thread]: runtime.throw(0x468fd7a, 0x21) /usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc0020d6a58 sp=0xc0020d6a28 pc=0x4031912 runtime.mapaccess1_faststr(0x45c5d80, 0xc0003ca180, 0xc002270081, 0x5, 0xc00228c0e0) /usr/local/go/src/runtime/map_faststr.go:21 +0x44f fp=0xc0020d6ac8 sp=0xc0020d6a58 pc=0x40152df git.arvados.org/arvados.git/sdk/go/arvados.(*treenode).Child(0xc000104300, 0xc002270081, 0x5, 0xc002269c50, 0x0, 0x40306ba, 0x44cd4fc, 0xc000104360) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:259 +0x56 fp=0xc0020d6b38 sp=0xc0020d6ac8 pc=0x44b3256 git.arvados.org/arvados.git/sdk/go/arvados.(*vdirnode).Child(0xc0000c2cc0, 0xc002270081, 0x5, 0x0, 0xc0020d6c00, 0x40f1883, 0xc0022700b1, 0xa) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_site.go:191 +0x9a fp=0xc0020d6b88 sp=0xc0020d6b38 pc=0x44c452a git.arvados.org/arvados.git/sdk/go/arvados.rlookup.func1(0xc0020d6cc0, 0xc0020d6c80, 0x0, 0x0, 0x0, 0x0) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:606 +0xc1 fp=0xc0020d6c10 sp=0xc0020d6b88 pc=0x44c78d1 git.arvados.org/arvados.git/sdk/go/arvados.rlookup(0x4758e00, 0xc0000c2cc0, 0xc002270080, 0x3b, 0x4758e00, 0xc0000c2cc0, 0xc0003c0288, 0x88) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:607 +0x1bd fp=0xc0020d6ca0 sp=0xc0020d6c10 pc=0x44b5e7d git.arvados.org/arvados.git/sdk/go/arvados.(*fileSystem).Stat(0xc00012a9a0, 0xc002270080, 0x3b, 0x0, 0x0, 0x0, 0x0) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:439 +0x4f fp=0xc0020d6cf0 sp=0xc0020d6ca0 pc=0x44b492f git.arvados.org/arvados.git/lib/mount.(*keepFS).Getattr(0xc0003c0240, 0xc002270080, 0x3b, 0xc00226fd40, 0xffffffffffffffff, 0x0) /ext-go/2/src/git.arvados.org/arvados.git/lib/mount/fs.go:240 +0x16d fp=0xc0020d6d70 sp=0xc0020d6cf0 pc=0x4545f2d github.com/arvados/cgofuse/fuse.hostGetattr(0x7002b60, 0x70000ffd2be0, 0x0) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:119 +0x123 fp=0xc0020d6e18 sp=0xc0020d6d70 pc=0x453a2b3 github.com/arvados/cgofuse/fuse.go_hostGetattr(...) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:717 github.com/arvados/cgofuse/fuse._cgoexpwrap_ecb3c7988e70_go_hostGetattr(0x7002b60, 0x70000ffd2be0, 0x12800028) _cgo_gotypes.go:592 +0x35 fp=0xc0020d6e40 sp=0xc0020d6e18 pc=0x45406a5 runtime.call32(0x0, 0x70000ffd2a10, 0x70000ffd2aa0, 0x18) /usr/local/go/src/runtime/asm_amd64.s:539 +0x3b fp=0xc0020d6e70 sp=0xc0020d6e40 pc=0x405fa3b runtime.cgocallbackg1(0x0) /usr/local/go/src/runtime/cgocall.go:314 +0x1b7 fp=0xc0020d6f58 sp=0xc0020d6e70 pc=0x4005a37 runtime.cgocallbackg(0x0) /usr/local/go/src/runtime/cgocall.go:191 +0xc1 fp=0xc0020d6fc0 sp=0xc0020d6f58 pc=0x40057e1 runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0) /usr/local/go/src/runtime/asm_amd64.s:793 +0x9b fp=0xc0020d6fe0 sp=0xc0020d6fc0 pc=0x406100b runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc0020d6fe8 sp=0xc0020d6fe0 pc=0x4061731 goroutine 1 [syscall, 5 minutes]: github.com/arvados/cgofuse/fuse._Cfunc_hostMount(0x3, 0xc0003bc6c0, 0x4f01310, 0x0) _cgo_gotypes.go:515 +0x4d github.com/arvados/cgofuse/fuse.c_hostMount.func1(0xc000000003, 0xc0003bc6c0, 0x4f01310, 0x4f01310) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:701 +0x97 github.com/arvados/cgofuse/fuse.c_hostMount(0xc000000003, 0xc0003bc6c0, 0x4f01310, 0xc0001a52c0) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:701 +0x3d github.com/arvados/cgofuse/fuse.(*FileSystemHost).Mount(0xc000168e80, 0x0, 0x0, 0xc00015f190, 0x1, 0x1, 0x0) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:666 +0x4ca git.arvados.org/arvados.git/lib/mount.(*cmd).RunCommand(0x4b38dd0, 0xc000184120, 0x28, 0xc00015f180, 0x2, 0x2, 0x4743220, 0xc0000b0000, 0x4743240, 0xc0000b0008, ...) /ext-go/2/src/git.arvados.org/arvados.git/lib/mount/command.go:81 +0x5c4 git.arvados.org/arvados.git/lib/cmd.Multi.RunCommand(0xc00015f0b0, 0x7ffeefbff4b0, 0x22, 0xc00015f170, 0x3, 0x3, 0x4743220, 0xc0000b0000, 0x4743240, 0xc0000b0008, ...) /ext-go/2/src/git.arvados.org/arvados.git/lib/cmd/cmd.go:89 +0x280 main.main() /ext-go/2/src/git.arvados.org/arvados.git/cmd/arvados-client/cmd.go:65 +0xe1 goroutine 6 [select]: git.arvados.org/arvados.git/sdk/go/keepclient.(*cachedSvcList).poll(0xc00000ebc0) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:94 +0x154 created by git.arvados.org/arvados.git/sdk/go/keepclient.(*KeepClient).discoverServices /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:150 +0x542 goroutine 35 [syscall, 5 minutes]: os/signal.signal_recv(0x0) /usr/local/go/src/runtime/sigqueue.go:144 +0x96 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x22 created by os/signal.init.0 /usr/local/go/src/os/signal/signal_unix.go:29 +0x41 goroutine 48 [chan receive, 5 minutes]: github.com/arvados/cgofuse/fuse.(*FileSystemHost).Mount.func3(0xc000168e80, 0xc0001a52c0) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:653 +0x41 created by github.com/arvados/cgofuse/fuse.(*FileSystemHost).Mount /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:652 +0x44b goroutine 42 [select]: git.arvados.org/arvados.git/sdk/go/keepclient.(*cachedSvcList).poll.func1(0xc0001a4d80, 0xc0001a4de0, 0xc00000ebc0) /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:77 +0x1a9 created by git.arvados.org/arvados.git/sdk/go/keepclient.(*cachedSvcList).poll /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:73 +0xaf goroutine 51 [runnable, locked to thread]: github.com/arvados/cgofuse/fuse._Cfunc_hostCstatFromFusestat(0x70000f9aebc0, 0x0, 0x0, 0x1000041ed, 0x14000001f6, 0x0, 0x8, 0x5e3c87a4, 0x3441ab98, 0x5e3c87a4, ...) _cgo_gotypes.go:430 +0x45 github.com/arvados/cgofuse/fuse.c_hostCstatFromFusestat(...) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:669 github.com/arvados/cgofuse/fuse.copyCstatFromFusestat(0x70000f9aebc0, 0xc0022d2000) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:80 +0x151 github.com/arvados/cgofuse/fuse.hostGetattr(0x6910210, 0x70000f9aebc0, 0x0) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:120 +0x14b github.com/arvados/cgofuse/fuse.go_hostGetattr(...) /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:717 github.com/arvados/cgofuse/fuse._cgoexpwrap_ecb3c7988e70_go_hostGetattr(0x6910210, 0x70000f9aebc0, 0x4f7c3f0) _cgo_gotypes.go:592 +0x35 goroutine 15611 [IO wait]: internal/poll.runtime_pollWait(0x6d79f38, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:184 +0x55 internal/poll.(*pollDesc).wait(0xc000104118, 0x72, 0x800, 0x83c, 0xffffffffffffffff) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Read(0xc000104100, 0xc0000e4900, 0x83c, 0x83c, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:169 +0x22b net.(*netFD).Read(0xc000104100, 0xc0000e4900, 0x83c, 0x83c, 0x203000, 0x580020000000000, 0x0) /usr/local/go/src/net/fd_unix.go:202 +0x4f net.(*conn).Read(0xc000154058, 0xc0000e4900, 0x83c, 0x83c, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:184 +0x68 crypto/tls.(*atLeastReader).Read(0xc000476180, 0xc0000e4900, 0x83c, 0x83c, 0xc0020d38c0, 0x4019f3e, 0xc0020d38a0) /usr/local/go/src/crypto/tls/conn.go:780 +0x60 bytes.(*Buffer).ReadFrom(0xc0000a8958, 0x47429c0, 0xc000476180, 0x400c9d5, 0x45dd5a0, 0x465c1a0) /usr/local/go/src/bytes/buffer.go:204 +0xb4 crypto/tls.(*Conn).readFromUntil(0xc0000a8700, 0x6d7a0d8, 0xc000154058, 0x5, 0xc000154058, 0x12) /usr/local/go/src/crypto/tls/conn.go:802 +0xec crypto/tls.(*Conn).readRecordOrCCS(0xc0000a8700, 0x0, 0x0, 0x3) /usr/local/go/src/crypto/tls/conn.go:609 +0x124 crypto/tls.(*Conn).readRecord(...) /usr/local/go/src/crypto/tls/conn.go:577 crypto/tls.(*Conn).Read(0xc0000a8700, 0xc0017ee000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/crypto/tls/conn.go:1255 +0x161 net/http.(*persistConn).Read(0xc0022cc5a0, 0xc0017ee000, 0x1000, 0x1000, 0xc000096120, 0xc0020d3c20, 0x40075a5) /usr/local/go/src/net/http/transport.go:1752 +0x75 bufio.(*Reader).fill(0xc0001419e0) /usr/local/go/src/bufio/bufio.go:100 +0x103 bufio.(*Reader).Peek(0xc0001419e0, 0x1, 0x0, 0x0, 0x1, 0xc008c5e800, 0x0) /usr/local/go/src/bufio/bufio.go:138 +0x4f net/http.(*persistConn).readLoop(0xc0022cc5a0) /usr/local/go/src/net/http/transport.go:1905 +0x1d6 created by net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:1574 +0xafe goroutine 15612 [select]: net/http.(*persistConn).writeLoop(0xc0022cc5a0) /usr/local/go/src/net/http/transport.go:2204 +0x123 created by net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:1575 +0xb23
Updated by Lucas Di Pentima almost 5 years ago
Running 3 concurrent 'git clone' commands was enough to make it crash again with the same error.
Updated by Lucas Di Pentima almost 5 years ago
My latest 'black box' tests from last week were done with the linux binary and the issues found on the Mac version didn't happened: running multiple concurrent git clone
operations worked great, also writing through a simulated slow link didn't crash the client.
Will give the code another look.
Updated by Lucas Di Pentima almost 5 years ago
Code review:
lib/mount/command.go
- Line 29: Typo on comment
- Line 60: Could that be simplified by using arvadosclient.MakeArvadosClient()?
lib/mount/fs.go
- Line 65: Why should
lookupFH()
lock the filesystem while just reading data? — NOTE: As I’m learning, concurrent reads AND writes are not possible on maps, but could we useRWMutex
to allow concurrent reads? - Line 255: Does this conditional make the return value to be “Not implemented” when the fh is a regular file or dir? A comment would help future readers.
- Line 257: The
else
clause isn’t necessary. In which case the flow would reach this? Asking because if we’re not implementing this, returning 0 is telling the SO that the operation succeeded.
- Line 65: Why should
Updated by Tom Clegg almost 5 years ago
- Target version changed from 2020-02-12 Sprint to 2020-02-26 Sprint
Updated by Tom Clegg almost 5 years ago
- Description updated (diff)
lib/mount/command.go
- Line 29: Typo on comment
Fixed
- Line 60: Could that be simplified by using arvadosclient.MakeArvadosClient()?
Not sure I follow. We could use MakeArvadosClient(), but we'd need to call NewClientFromEnv() anyway to get the client var, and once we have that, it seems cleaner to derive the ac/kc clients from it...
lib/mount/fs.go
- Line 65: Why should
lookupFH()
lock the filesystem while just reading data? — NOTE: As I’m learning, concurrent reads AND writes are not possible on maps, but could we useRWMutex
to allow concurrent reads?
Yes, good point -- reads probably outnumber writes, so RWMutex is probably better / less blocking. Changed.
- Line 255: Does this conditional make the return value to be “Not implemented” when the fh is a regular file or dir? A comment would help future readers.
Added comments. It's ENOSYS when changing mode from file to dir, or vice versa.
- Line 257: The
else
clause isn’t necessary. In which case the flow would reach this? Asking because if we’re not implementing this, returning 0 is telling the SO that the operation succeeded.
Added comments. This is the case where chmod is expected to succeed (it's only changing permission bits). It's a no-op because we don't save permission bits. We could return ENOSYS if the mode isn't 0755. I suspect that would make lots of things (like tar xzf) fail instead of doing the obvious thing, though. Perhaps we should have "strict/loose" modes? Meanwhile I've added this to the list of shortcomings.
12308-cgofuse @ 3a2006d29fc38596a4dfb19b331bf2c86a9185ae -- developer-run-tests: #1728
Updated by Lucas Di Pentima almost 5 years ago
Thanks for the clarifications and comments, they'll be helpful.
About arvadosclient.MakeArvadosClient()
I missed that the client
var was used elsewhere, sorry.
This LGTM, thanks!
Updated by Tom Clegg almost 5 years ago
- Target version deleted (
2020-02-26 Sprint)
Updated by Tom Clegg over 4 years ago
- Related to Bug #16727: [FUSE] [cgofuse] Refresh signatures / reload collection instead of using expired blob signatures added
Updated by Peter Amstutz over 3 years ago
- Related to Story #17849: FUSE driver v2 added