Bug #12306
closed[arv-mount] --unmount should work on an unresponsive mount
100%
Description
Currently, if an arv-mount process is in some deadlocked/stuck state, running arv-mount --unmount PATH
just hangs instead of unmounting.
When this happens, echo 1 > /sys/fs/fuse/connections/NNN/abort
revives the stuck unmount command.
It looks like arv-mount --unmount
attempts to lstat() all mount points in /proc/self/mounts
and lstat(stuck_mount_path) hangs.
This seems to be the fault of realpath() in source:services/fuse/arvados_fuse/unmount.py:
while True:
mounted = False
for m in mountinfo():
if m.is_fuse and (mnttype is None or mnttype == m.mnttype):
try:
if os.path.realpath(m.path) == path:
On the shell node where this happened, where /home
and /home/foo
are both symlinks, arv-mount /home/foo/keep
results in /data-sdd/foo/keep
appearing in /proc/self/mountinfo
, which means realpath() is superfluous here. (Is that true on all systems?)
Updated by Tom Clegg over 7 years ago
These stuck mounts come up occasionally on Jenkins. When they do, all builds get stuck ("UnmountTest" -- presumably because of this bug), until someone clears the stuck mounts manually using ".../connections/NNN/abort" or "fusermount -u -z".
Updated by Tom Morris over 7 years ago
- Status changed from New to In Progress
- Assigned To changed from Tom Morris to Tom Clegg
Updated by Tom Clegg over 7 years ago
12306-dont-stat-mounts @ 7bc55d65082b3a39639508fcaebd1185b7e04089
Updated by Peter Amstutz over 7 years ago
So following symlinks to mounts seems weird and not something you would normally do, however, the other thing that realpath() does is turn a relative path into an absolute path, which is probably what we were really trying to use it for. So how about adding this back in?
path = os.path.abspath(path)
(abspath doesn't use stat(), only get os.getcwd()).
Updated by Tom Clegg over 7 years ago
Peter Amstutz wrote:
So following symlinks to mounts seems weird and not something you would normally do
On our shell nodes $HOME is typically /home/username where /home is a symlink, so ~/keep
doesn't appear in mountinfo but realpath(~/keep) does.
I wonder if it's worth implementing a more careful realpath() that can resolve ~/keep
in such situations without calling lstat() on ~/keep
itself. Seems like a bit of a rabbit hole, though.
(abspath doesn't use stat(), only get os.getcwd()).
Indeed, one less opportunity to fall into the realpath() hole. Added.
12306-dont-stat-mounts @ aabf1ca0e99701550f9af785e9f1fee098b0020a
Updated by Peter Amstutz over 7 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
So following symlinks to mounts seems weird and not something you would normally do
On our shell nodes $HOME is typically /home/username where /home is a symlink, so /keep doesn't appear in mountinfo but realpath(/keep) does.
Got it. But does that mean arv-mount --umount won't actually work in this case, when you have a stuck mount which you are trying to unmount on a symlink path?
I wonder if it's worth implementing a more careful realpath() that can resolve ~/keep in such situations without calling lstat() on ~/keep itself. Seems like a bit of a rabbit hole, though.
How about calling realpath() on the parent directory and then joining it with the mount point?
Updated by Tom Clegg over 7 years ago
Indeed, the previous version would have ended up calling realpath() on ~/keep
on a system where $HOME
contains symlinks.
I think I made it back from the rabbit hole with a version that avoids calling realpath in those cases.
12306-dont-stat-mounts @ 08a4ebba0e5bfbc179103ac5e6916164bc8083fa
Updated by Peter Amstutz over 7 years ago
Tom Clegg wrote:
Indeed, the previous version would have ended up calling realpath() on
~/keep
on a system where$HOME
contains symlinks.I think I made it back from the rabbit hole with a version that avoids calling realpath in those cases.
12306-dont-stat-mounts @ 08a4ebba0e5bfbc179103ac5e6916164bc8083fa
Tentatively, safer_realpath
seems to work.
I just noticed that arv-mount --unmount requires an unnecessary API token:
$ arv-mount --unmount keep/ 2017-11-08 09:49:38 arvados.arv-mount[7740] ERROR: Missing environment: 'ARVADOS_API_TOKEN'
Unmounting an arv-mount which is stuck with SIGSTOP does remove the mount but doesn't kill the daemon:
- arv-mount
- SIGSTOP
- arv-mount --unmount (works)
- SIGCONT
- arv-mount is still there
Could be a problem if it is occupying a lot of memory and refusing to go away on its own.
Updated by Peter Amstutz over 7 years ago
My preferred method to bring the hammer down:
- abort if available
- sigkill
- fusermount -u -z
Updated by Peter Amstutz over 7 years ago
Otherwise, the main goal of this bugfix (don't get stuck on realpath()) seems to be accomplished, so declare victory and merge.
LGTM
Updated by Anonymous over 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:0af053088c83d1107866cb06fd6c5736d9065eee.