Feature #23246
closedGo FUSE driver supports --mount-by-id
Description
If you run arvados-client mount, the default mount includes a by_id subdirectory where can access collections through subdirectories named after a portable data hash or UUID.
Add an option --mount-by-id option. This completely disables the default mount layout, and takes an argument that sets the name of the "by id" subdirectory. Demo example:
⟩⟩ arv-mount --mount-by-id IDDemo keep
[The mount only has the single specified subdirectory now.]
⟩⟩ ll -R keep
keep:
total 512
dr-xr-xr-x 1 0 2025-11-25 09:39 IDDemo
keep/IDDemo:
total 1.0K
-r--r--r-- 1 560 2025-11-25 09:39 README
[Note IDDemo only lists a README, but we can list a collection by PDH:]
⟩⟩ ll keep/IDDemo/723ccb12f21518b2fe936ec68d2d5927+2900
total 5.0K
dr-xr-xr-x 1 0 1969-12-31 19:00 helper
dr-xr-xr-x 1 0 1969-12-31 19:00 original
-r-xr-xr-x 1 3.7K 1969-12-31 19:00 wgs-processing-wf.cwl
[Now this collection appears in the IDDemo listing:]
⟩⟩ ll -R keep
keep:
total 512
dr-xr-xr-x 1 0 2025-11-25 09:39 IDDemo
keep/IDDemo:
total 1.5K
dr-xr-xr-x 1 0 1969-12-31 19:00 723ccb12f21518b2fe936ec68d2d5927+2900
-r--r--r-- 1 560 2025-11-25 09:39 README
keep/IDDemo/723ccb12f21518b2fe936ec68d2d5927+2900:
[From here it repeats the collection listing.]
Here is some related work that will need to happen as part of using arvados-client mount in crunch-run. You should be aware of these requirements so you don't paint yourself into a design corner you need to break out of later:
- We will later want an option
--mount-by-pdh SUBDIR. This works the same way, except the subdirectory only lets you access collections by portable data hash, not UUID. - Bear in mind that the FUSE driver will need to be able to support multiple layout options as long as they don't explicitly conflict. For example, crunch-run typically calls the FUSE driver with both
--mount-by-pdhand--mount-tmp(which is a different kind of filesystem in a separate ticket).
Updated by Brett Smith 4 months ago
- Target version changed from Development 2025-11-12 to Development 2025-11-26
Updated by Tom Clegg 4 months ago
- Blocked by Feature #23313: CustomFileSystem supports custom mount options needed by crunch-run added
Updated by Brett Smith 4 months ago
- Subject changed from Go FUSE driver supports switches to determine the mount layout to Go FUSE driver supports switches crunch-run uses to set up compute mounts
Updated by Brett Smith 4 months ago
- Description updated (diff)
- Subject changed from Go FUSE driver supports switches crunch-run uses to set up compute mounts to Go FUSE driver supports --mount-by-pdh and/or --mount-by-id
Updated by Brett Smith 4 months ago
- Description updated (diff)
- Subject changed from Go FUSE driver supports --mount-by-pdh and/or --mount-by-id to Go FUSE driver supports --mount-by-id
Updated by Brett Smith 4 months ago
- Target version changed from Development 2025-11-26 to Development 2025-12-10
Updated by Lisa Knox 4 months ago
23246-mount-by-id @ 6e4fcf4ba624e11340dd8e865b76af998559b40b
- ✅ All agreed upon points are implemented / addressed.
- Mounting the go FUSE driver with the flag
-mount-by-id foowill create a dir named "foo" at the mountpoint provided. - Said dir behaves like a "magic directory", allowing access to Arvodos resources using UUIDs.
- If
-mount-by-idis passed, the subdir name cannot be blank.
- Mounting the go FUSE driver with the flag
- ✅ Anything not implemented (discovered or discussed during work) has a follow-up story.
- ✅ Code is tested and passing, both automated and manual, what manual testing was done is described
- ✅ The tested code incorporates recent main branch changes.
- ✅ New or changed UX/UX and has gotten feedback from stakeholders.
- n/a Documentation has been updated.
- ✅ Behaves appropriately at the intended scale (describe intended scale).
- Behaves on par with existing functionality
- ✅ Considered backwards and forwards compatibility issues between client and server.
- ✅ Follows our coding standards and GUI style guidelines.
- in
lib/mount/command.go L85-99there's a bit of clunkiness, but this was done in anticipation of the future "mount-by-pdh" task.
Updated by Tom Clegg 4 months ago
I don't think we need to do all the stuff with flags.Visit(), we can just check *mountById != "". Sure, arv-mount --mount-by-id "" is an error, and we're trying to be compatible with arv-mount, so we could make it an error here too. But IMO ... mount --mount-by-id "" should just be equivalent to not passing --mount-by-id at all, for the sake of simplicity and broader convention. If we want to go hard on compatibility, we should probably start with the cases that aren't errors, like arv-mount --mount-by-id "a" --mount-by-id "b" (but I'm not convinced that's a feature anyone uses -- would anyone suspect it's supported if they hadn't been reading the source code recently?)
I'm not keen on passing customDirName to keepFS. I think it would be better to move the CustomFileSystem initialization into command.go, so instead of
type keepFS struct {
// ...
Client *arvados.Client
KeepClient *keepclient.KeepClient
// ...
root arvados.CustomFileSystem
we have
type keepFS struct {
// ...
Root arvados.CustomFileSystem
That way RunCommand can initialize the CustomFileSystem, and Init doesn't need to repeat the same list of command line options. (Besides, keepFS has enough of a purpose being an adapter between fuse and arvados.CustomFileSystem, it shouldn't get this additional responsibility unless it has some reason to be involved. Now that initialization is getting non-trivial I see that we should have passed in CustomFileSystem instead of Client/KeepClient in the first place.)
TestMountById is too copy-pastey. The setup/timeout boilerplate should be moved to a func (s *CmdSuite) testMount(c *check.C, args []string, testfunc func()) that we can call like this
func (s *CmdSuite) TestMountById(c *check.C) {
testMount(c, []string{"--mount-by-id", "by_id_test"}, func() {
f, err := os.Open(s.mnt + "/by_id_test/" + arvadostest.FooCollection)
c.Assert(err, check.IsNil)
// ...
})
}
- If we put the
Unmountstep in a defer (right after receiving frommountCmd.ready) then we can safely usec.Assert()in the testfunc without inadvertently skipping unmount. - If we add
stderr *bytes.Bufferto the CmdSuite struct, and make our bytes.NewBuffer(nil) there instead of in a local variable, then TestMount's testfunc will have access to it.
I don't think we need the arvados.SiteFileSystemById() convenience function -- lib/mount can call CustomFileSystem() and MountByID() itself. (The existing SiteFileSystem() function is similarly not strictly necessary, but it does provide a "standard layout" convention, and we're not doing that here.)
Updated by Brett Smith 4 months ago
Tom Clegg wrote in #note-13:
I don't think we need to do all the stuff with flags.Visit(), we can just check
*mountById != "". Sure,arv-mount --mount-by-id ""is an error, and we're trying to be compatible with arv-mount, so we could make it an error here too. But IMO... mount --mount-by-id ""should just be equivalent to not passing--mount-by-idat all, for the sake of simplicity and broader convention.
I'm not sure what broader convention you're referring to for command line programs, could you please elaborate? Right now I prefer this be an error. Mainly because one real way it might come up is if you script a mount with --mount-by-id "$var" when $var is unset. That's a lot easier to debug if it's an error rather than ignored.
If we want to go hard on compatibility, we should probably start with the cases that aren't errors, like
arv-mount --mount-by-id "a" --mount-by-id "b"(but I'm not convinced that's a feature anyone uses -- would anyone suspect it's supported if they hadn't been reading the source code recently?)
I concede probably nobody is doing this today, but we do have users with long-lived shared mounts. It's easy to imagine a situation where they decide to rename the "main" by-id directory they use (for easier use with new tools, or new compliance conventions, or whatever) but they want to keep the old name for compatibility with existing scripts, etc. And that's something I could see an admin just trying to see if it works without digging through the source.
Updated by Tom Clegg 3 months ago
Here's how to accept --flag a --flag b: https://gist.github.com/gammazero/525fdffd273450edbcf8baadf849333a
Then we can check len(mountByID) instead of using flags.Visit().
Updated by Brett Smith 3 months ago
- Target version changed from Development 2025-12-10 to Development 2026-01-06
Updated by Lisa Knox 3 months ago
23246-mount-by-id @ be82ab37b1d4623cdf5954f93222d53f6037cb65
Tom Clegg wrote in #note-13:
I don't think we need to do all the stuff with flags.Visit()...
(Brett) I concede probably nobody is doing this today, but we do have users with long-lived shared mounts...
Here's how to accept --flag a --flag b...
It turns out that this was super easy to make work. Users can now pass -mount-by-id a -mount-by-id b and the mount fs will have dirs named "a" and "b".
I'm not keen on passing customDirName to keepFS. I think it would be better to move the CustomFileSystem initialization into command.go...
Done
TestMountById is too copy-pastey...
Created a helper function to do the mounting and related assertions. The checks that logging has stopped (for tests that generate crunchstat logs) can only be done after unmounting, so that's also in the helper function.
I don't think we need the arvados.SiteFileSystemById()
Removed, with functionality moved to RunCommand
Also merged main in because the tests from #23245 needed to be aligned with the new command_test.go test pattern.
Updated by Tom Clegg 3 months ago
Lisa Knox wrote in #note-17:
Created a helper function to do the mounting and related assertions.
Much better, thanks.
The checks that logging has stopped (for tests that generate crunchstat logs) can only be done after unmounting, so that's also in the helper function.
I don't quite follow this -- why not check in TestCrunchstatLogger after mountAndCheck returns?
Similarly, TestMount should still check c.Check(stderr.String(), check.Equals, "") after mountAndCheck.
Also merged main in because the tests from #23245 needed to be aligned with the new command_test.go test pattern.
TestWriteMetrics was deleted in be82ab37b1d4623cdf5954f93222d53f6037cb65 after the merge. I'm guessing this was an accident?
The flag value type isn't really specific to mount-by-id args (and we'll probably do the same thing for mount-by-pdh etc) so arrayFlagValue would be a better name than mountByIdArgs.
dirNames should be mountByID or byIDDirs or something
The flag package may call the String method with a zero-valued receiver, such as a nil pointer. I'm not sure when it does this but the String method should return "" instead of crashing if a==nil.
We should backquote the word "directory" in the help message so -help says "-mount-by-id directory" instead of "-mount-by-id value". See https://pkg.go.dev/flag#PrintDefaults (I see some of the existing help messages missed this memo too, my bad)
Updated by Lisa Knox 3 months ago
23246-mount-by-id @ e17b2965675656ec13bcf74950c4530668556cfd
I don't quite follow this -- why not check in TestCrunchstatLogger after mountAndCheck returns?
Similarly, TestMount should still check @c.Check(stderr.String()...
In my head I was thinking "unmount === end of test" for some reason, fixed.
TestWriteMetrics was deleted in be82ab37b1d4623cdf5954f93222d53f6037cb65 after the merge. I'm guessing this was an accident?
Yes, restored. The merge was pretty conflicted, I should have double-checked everything afterwards.
The flag value type isn't really specific...
dirNames should be mountByID or byIDDirs or something
Both variables renamed.
The flag package may call the String method with a zero-valued receiver, such as a nil pointer. I'm not sure when it does this but the String method should
return ""instead of crashing ifa==nil.
Done, and this leads me to believe that I should check package docs more closely.
We should backquote the word "directory" in the help message so
-helpsays "-mount-by-id directory" instead of "-mount-by-id value". See https://pkg.go.dev/flag#PrintDefaults (I see some of the existing help messages missed this memo too, my bad)
After discussion, it was decided to align with the way arv-mount does it, i.e. "-mount-by-id PATH" or "-log-level LEVEL".
Updated by Lisa Knox 3 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|162c08a9aa728f8a8b4df18a47baf564836ed892.