Project

General

Profile

Actions

Bug #22708

open

keepstore DeviceID mis-parses cgroup bind mounts, probably shouldn't expect to find the device

Added by Brett Smith 12 months ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Keep
Target version:
-
Story points:
-

Description

Running keepstore in a systemd-nspawn VM logs:

{"ClusterID":"z2a05","PID":119,"level":"info","msg":"stat \"/dev/mapper/vg-var--lib--machines[/z2a05b1]\": stat /dev/mapper/vg-var--lib--machines[/z2a05b1]: no such file or directory; using hostname:path for volume z2a05-nyw5e-syv6ysltjskposm","time":"2025-03-28T14:10:34.338693041-04:00"}
{"ClusterID":"z2a05","PID":119,"Volume":"z2a05b1.local:/var/lib/arvados/keep-data","level":"info","msg":"stat \"/dev/mapper/vg-var--lib--machines[/z2a05b1]\": stat /dev/mapper/vg-var--lib--machines[/z2a05b1]: no such file or directory; using hostname:path for volume z2a05-nyw5e-syv6ysltjskposm","time":"2025-03-28T14:10:34.339976657-04:00"}

The last message repeats two more times. This does reflect the output of findmnt:

$ sudo findmnt --noheadings --target /var/lib/arvados/keep-data
/      /dev/mapper/vg-var--lib--machines[/z2a05b1] btrfs  rw,relatime,idmapped,ssd,discard=async,space_cache=v2,subvolid=2

But keepstore.DeviceID() is misinterpreting this output. findmnt uses /dev/foo[/bar] to say "the root filesystem is on /dev/foo, and this is a mount of /bar there."

If it really needs the filesystem UUID, it should be looking at /dev/mapper/vg-var--lib--machines. But note that it won't find that either, because the host's devices are not available inside the VM container.

And even without a container per se, you could see this happen in keepstore if you added a systemd override with PrivateDevices=on.

So as an administrator, when I see these logs, I wonder what I'm supposed to make of it. It's apparently not a fatal problem, but what are the consequences of this giving up? How worried should I be? If I'm trying to diagnose problems with my keepstore, it's hard to tell whether I should try to address this or if it's purely informational.

Actions #1

Updated by Tom Clegg 12 months ago

DeviceID enables keep-balance to detect a config error where two Keep volumes (with different UUIDs) are backed by the same underlying storage (same directory on same filesystem, in this case). Keep-balance errors out if it detects this situation based on the DeviceIDs reported by keepstore.

We do have other protections for this sort of problem (e.g., when deleting excess replicas, if replicas on different volumes have identical timestamps we assume they are just multiple views of the same replica and assume deleting one would actually delete all of them), so it's not critical.

Perhaps the log message should say "unable to determine filesystem UUID, will not be able to detect double-mounted filesystems"? With a link to https://doc.arvados.org/main/install/install-keepstore.html, which should mention the rule that you shouldn't use the same back-end device/directory for multiple keep volumes? (I don't see it there.)

Actions

Also available in: Atom PDF