Project

General

Profile

Actions

Feature #23246

closed

Go FUSE driver supports --mount-by-id

Added by Brett Smith 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Story points:
-
Release relationship:
Auto

Description

If you run arvados-client mount, the default mount includes a by_id subdirectory where can access collections through subdirectories named after a portable data hash or UUID.

Add an option --mount-by-id option. This completely disables the default mount layout, and takes an argument that sets the name of the "by id" subdirectory. Demo example:

⟩⟩ arv-mount --mount-by-id IDDemo keep
[The mount only has the single specified subdirectory now.]
⟩⟩ ll -R keep
keep:
total 512
dr-xr-xr-x 1 0 2025-11-25 09:39 IDDemo

keep/IDDemo:
total 1.0K
-r--r--r-- 1 560 2025-11-25 09:39 README

[Note IDDemo only lists a README, but we can list a collection by PDH:]
⟩⟩ ll keep/IDDemo/723ccb12f21518b2fe936ec68d2d5927+2900
total 5.0K
dr-xr-xr-x 1    0 1969-12-31 19:00 helper
dr-xr-xr-x 1    0 1969-12-31 19:00 original
-r-xr-xr-x 1 3.7K 1969-12-31 19:00 wgs-processing-wf.cwl

[Now this collection appears in the IDDemo listing:]
⟩⟩ ll -R keep
keep:
total 512
dr-xr-xr-x 1 0 2025-11-25 09:39 IDDemo

keep/IDDemo:
total 1.5K
dr-xr-xr-x 1   0 1969-12-31 19:00 723ccb12f21518b2fe936ec68d2d5927+2900
-r--r--r-- 1 560 2025-11-25 09:39 README

keep/IDDemo/723ccb12f21518b2fe936ec68d2d5927+2900:
[From here it repeats the collection listing.]

Here is some related work that will need to happen as part of using arvados-client mount in crunch-run. You should be aware of these requirements so you don't paint yourself into a design corner you need to break out of later:

  • We will later want an option --mount-by-pdh SUBDIR. This works the same way, except the subdirectory only lets you access collections by portable data hash, not UUID.
  • Bear in mind that the FUSE driver will need to be able to support multiple layout options as long as they don't explicitly conflict. For example, crunch-run typically calls the FUSE driver with both --mount-by-pdh and --mount-tmp (which is a different kind of filesystem in a separate ticket).

Subtasks 1 (0 open1 closed)

Task #23248: Review 23246-mount-by-idResolvedLisa Knox01/14/2026Actions

Related issues 1 (0 open1 closed)

Blocked by Arvados - Feature #23313: CustomFileSystem supports custom mount options needed by crunch-runResolvedTom CleggActions
Actions #1

Updated by Brett Smith 5 months ago

  • Subtask #23248 added
Actions #2

Updated by Brett Smith 4 months ago

  • Target version changed from Development 2025-11-12 to Development 2025-11-26
Actions #3

Updated by Tom Clegg 4 months ago

  • Blocked by Feature #23313: CustomFileSystem supports custom mount options needed by crunch-run added
Actions #4

Updated by Lisa Knox 4 months ago

  • Status changed from New to In Progress
Actions #5

Updated by Brett Smith 4 months ago

  • Description updated (diff)
Actions #6

Updated by Brett Smith 4 months ago

  • Subject changed from Go FUSE driver supports switches to determine the mount layout to Go FUSE driver supports switches crunch-run uses to set up compute mounts
Actions #7

Updated by Brett Smith 4 months ago

  • Description updated (diff)
  • Subject changed from Go FUSE driver supports switches crunch-run uses to set up compute mounts to Go FUSE driver supports --mount-by-pdh and/or --mount-by-id
Actions #8

Updated by Brett Smith 4 months ago

  • Description updated (diff)
Actions #9

Updated by Brett Smith 4 months ago

  • Description updated (diff)
  • Subject changed from Go FUSE driver supports --mount-by-pdh and/or --mount-by-id to Go FUSE driver supports --mount-by-id
Actions #10

Updated by Brett Smith 4 months ago

  • Description updated (diff)
Actions #11

Updated by Brett Smith 4 months ago

  • Target version changed from Development 2025-11-26 to Development 2025-12-10
Actions #12

Updated by Lisa Knox 4 months ago

23246-mount-by-id @ 6e4fcf4ba624e11340dd8e865b76af998559b40b

developer-run-tests: #4964

  • ✅ All agreed upon points are implemented / addressed.
    • Mounting the go FUSE driver with the flag -mount-by-id foo will create a dir named "foo" at the mountpoint provided.
    • Said dir behaves like a "magic directory", allowing access to Arvodos resources using UUIDs.
    • If -mount-by-id is passed, the subdir name cannot be blank.
  • ✅ Anything not implemented (discovered or discussed during work) has a follow-up story.
  • ✅ Code is tested and passing, both automated and manual, what manual testing was done is described
  • ✅ The tested code incorporates recent main branch changes.
  • ✅ New or changed UX/UX and has gotten feedback from stakeholders.
  • n/a Documentation has been updated.
  • ✅ Behaves appropriately at the intended scale (describe intended scale).
    • Behaves on par with existing functionality
  • ✅ Considered backwards and forwards compatibility issues between client and server.
  • ✅ Follows our coding standards and GUI style guidelines.
Notes:
  • in lib/mount/command.go L85-99 there's a bit of clunkiness, but this was done in anticipation of the future "mount-by-pdh" task.
Actions #13

Updated by Tom Clegg 4 months ago

I don't think we need to do all the stuff with flags.Visit(), we can just check *mountById != "". Sure, arv-mount --mount-by-id "" is an error, and we're trying to be compatible with arv-mount, so we could make it an error here too. But IMO ... mount --mount-by-id "" should just be equivalent to not passing --mount-by-id at all, for the sake of simplicity and broader convention. If we want to go hard on compatibility, we should probably start with the cases that aren't errors, like arv-mount --mount-by-id "a" --mount-by-id "b" (but I'm not convinced that's a feature anyone uses -- would anyone suspect it's supported if they hadn't been reading the source code recently?)

I'm not keen on passing customDirName to keepFS. I think it would be better to move the CustomFileSystem initialization into command.go, so instead of

type keepFS struct {
        // ...
        Client     *arvados.Client
        KeepClient *keepclient.KeepClient
        // ...
        root arvados.CustomFileSystem

we have

type keepFS struct {
        // ...
        Root arvados.CustomFileSystem

That way RunCommand can initialize the CustomFileSystem, and Init doesn't need to repeat the same list of command line options. (Besides, keepFS has enough of a purpose being an adapter between fuse and arvados.CustomFileSystem, it shouldn't get this additional responsibility unless it has some reason to be involved. Now that initialization is getting non-trivial I see that we should have passed in CustomFileSystem instead of Client/KeepClient in the first place.)

TestMountById is too copy-pastey. The setup/timeout boilerplate should be moved to a func (s *CmdSuite) testMount(c *check.C, args []string, testfunc func()) that we can call like this

func (s *CmdSuite) TestMountById(c *check.C) {
        testMount(c, []string{"--mount-by-id", "by_id_test"}, func() {
                f, err := os.Open(s.mnt + "/by_id_test/" + arvadostest.FooCollection)
                c.Assert(err, check.IsNil)
                // ...
        })
}
  • If we put the Unmount step in a defer (right after receiving from mountCmd.ready) then we can safely use c.Assert() in the testfunc without inadvertently skipping unmount.
  • If we add stderr *bytes.Buffer to the CmdSuite struct, and make our bytes.NewBuffer(nil) there instead of in a local variable, then TestMount's testfunc will have access to it.

I don't think we need the arvados.SiteFileSystemById() convenience function -- lib/mount can call CustomFileSystem() and MountByID() itself. (The existing SiteFileSystem() function is similarly not strictly necessary, but it does provide a "standard layout" convention, and we're not doing that here.)

Actions #14

Updated by Brett Smith 4 months ago

Tom Clegg wrote in #note-13:

I don't think we need to do all the stuff with flags.Visit(), we can just check *mountById != "". Sure, arv-mount --mount-by-id "" is an error, and we're trying to be compatible with arv-mount, so we could make it an error here too. But IMO ... mount --mount-by-id "" should just be equivalent to not passing --mount-by-id at all, for the sake of simplicity and broader convention.

I'm not sure what broader convention you're referring to for command line programs, could you please elaborate? Right now I prefer this be an error. Mainly because one real way it might come up is if you script a mount with --mount-by-id "$var" when $var is unset. That's a lot easier to debug if it's an error rather than ignored.

If we want to go hard on compatibility, we should probably start with the cases that aren't errors, like arv-mount --mount-by-id "a" --mount-by-id "b" (but I'm not convinced that's a feature anyone uses -- would anyone suspect it's supported if they hadn't been reading the source code recently?)

I concede probably nobody is doing this today, but we do have users with long-lived shared mounts. It's easy to imagine a situation where they decide to rename the "main" by-id directory they use (for easier use with new tools, or new compliance conventions, or whatever) but they want to keep the old name for compatibility with existing scripts, etc. And that's something I could see an admin just trying to see if it works without digging through the source.

Actions #15

Updated by Tom Clegg 3 months ago

Here's how to accept --flag a --flag b: https://gist.github.com/gammazero/525fdffd273450edbcf8baadf849333a

Then we can check len(mountByID) instead of using flags.Visit().

Actions #16

Updated by Brett Smith 3 months ago

  • Target version changed from Development 2025-12-10 to Development 2026-01-06
Actions #17

Updated by Lisa Knox 3 months ago

23246-mount-by-id @ be82ab37b1d4623cdf5954f93222d53f6037cb65

developer-run-tests: #4982

Tom Clegg wrote in #note-13:

I don't think we need to do all the stuff with flags.Visit()...
(Brett) I concede probably nobody is doing this today, but we do have users with long-lived shared mounts...
Here's how to accept --flag a --flag b...

It turns out that this was super easy to make work. Users can now pass -mount-by-id a -mount-by-id b and the mount fs will have dirs named "a" and "b".

I'm not keen on passing customDirName to keepFS. I think it would be better to move the CustomFileSystem initialization into command.go...

Done

TestMountById is too copy-pastey...

Created a helper function to do the mounting and related assertions. The checks that logging has stopped (for tests that generate crunchstat logs) can only be done after unmounting, so that's also in the helper function.

I don't think we need the arvados.SiteFileSystemById()

Removed, with functionality moved to RunCommand

Also merged main in because the tests from #23245 needed to be aligned with the new command_test.go test pattern.

Actions #18

Updated by Tom Clegg 3 months ago

Lisa Knox wrote in #note-17:

Created a helper function to do the mounting and related assertions.

Much better, thanks.

The checks that logging has stopped (for tests that generate crunchstat logs) can only be done after unmounting, so that's also in the helper function.

I don't quite follow this -- why not check in TestCrunchstatLogger after mountAndCheck returns?

Similarly, TestMount should still check c.Check(stderr.String(), check.Equals, "") after mountAndCheck.

Also merged main in because the tests from #23245 needed to be aligned with the new command_test.go test pattern.

TestWriteMetrics was deleted in be82ab37b1d4623cdf5954f93222d53f6037cb65 after the merge. I'm guessing this was an accident?

The flag value type isn't really specific to mount-by-id args (and we'll probably do the same thing for mount-by-pdh etc) so arrayFlagValue would be a better name than mountByIdArgs.

dirNames should be mountByID or byIDDirs or something

The flag package may call the String method with a zero-valued receiver, such as a nil pointer. I'm not sure when it does this but the String method should return "" instead of crashing if a==nil.

We should backquote the word "directory" in the help message so -help says "-mount-by-id directory" instead of "-mount-by-id value". See https://pkg.go.dev/flag#PrintDefaults (I see some of the existing help messages missed this memo too, my bad)

Actions #19

Updated by Lisa Knox 3 months ago

23246-mount-by-id @ e17b2965675656ec13bcf74950c4530668556cfd

developer-run-tests: #4985

I don't quite follow this -- why not check in TestCrunchstatLogger after mountAndCheck returns?
Similarly, TestMount should still check @c.Check(stderr.String()...

In my head I was thinking "unmount === end of test" for some reason, fixed.

TestWriteMetrics was deleted in be82ab37b1d4623cdf5954f93222d53f6037cb65 after the merge. I'm guessing this was an accident?

Yes, restored. The merge was pretty conflicted, I should have double-checked everything afterwards.

The flag value type isn't really specific...
dirNames should be mountByID or byIDDirs or something

Both variables renamed.

The flag package may call the String method with a zero-valued receiver, such as a nil pointer. I'm not sure when it does this but the String method should return "" instead of crashing if a==nil.

Done, and this leads me to believe that I should check package docs more closely.

We should backquote the word "directory" in the help message so -help says "-mount-by-id directory" instead of "-mount-by-id value". See https://pkg.go.dev/flag#PrintDefaults (I see some of the existing help messages missed this memo too, my bad)

After discussion, it was decided to align with the way arv-mount does it, i.e. "-mount-by-id PATH" or "-log-level LEVEL".

Actions #20

Updated by Tom Clegg 3 months ago

Might be worth adding a couple of things to TestMountById:
  1. mount two dirs instead of just one, and check that FooCollection exists in both
  2. check that the usual default layout dirs home and by_id don't exist

LGTM, thanks!

Actions #21

Updated by Lisa Knox 3 months ago

developer-run-tests: #4987

Done, and I also added a check in TestMount that the default layout dirs are created.

Actions #22

Updated by Lisa Knox 3 months ago

  • Status changed from In Progress to Resolved
Actions #23

Updated by Brett Smith about 2 months ago

  • Release set to 84
Actions

Also available in: Atom PDF