Bug #22653
openLots of "cannot mmap empty file" FUSE errors on Jenkins
Description
We have started seeing this more ever since we did #22489 and #22579. It's unclear whether it's a side effect of the actual setup, or of increased reuse, or what. But here's a recent example from run-tests-remainder: #5030 :
FAIL: singularity_test.go:186: singularitySuite.TestImageCache_New
building singularity image
[singularity build /tmp/crunch-run-singularity-1282974358/image.sif docker-archive:///tmp/crunch-run-singularity-1282974358/image.tar]
INFO: Starting build...
Getting image source signatures
Copying blob sha256:67f770da229bf16d0c280f232629b0c1f1243a884df09f6b940a1c7288535a6d
Copying config sha256:a11e762410a6fb4e925d1ea535fecc177d983bdf0dba3261d244fb3c7ee18865
Writing manifest to image destination
Storing signatures
2025/03/10 17:13:18 info unpack layer: sha256:378e3b9fb50c743e1daa7a79dc2cf7c18aa0ac8137a1ca0d51a3b909c80e7d48
INFO: Creating SIF file...
INFO: Build complete: /tmp/crunch-run-singularity-1282974358/image.sif
building singularity image
[singularity build /tmp/check-505614902/43/by_uuid/zzzzz-4zz18-4al6gql7va4r8fz/image.sif docker-archive:///tmp/crunch-run-singularity-2250306490/image.tar]
INFO: Starting build...
Getting image source signatures
Copying blob sha256:67f770da229bf16d0c280f232629b0c1f1243a884df09f6b940a1c7288535a6d
Copying config sha256:a11e762410a6fb4e925d1ea535fecc177d983bdf0dba3261d244fb3c7ee18865
Writing manifest to image destination
Storing signatures
2025/03/10 17:13:19 info unpack layer: sha256:378e3b9fb50c743e1daa7a79dc2cf7c18aa0ac8137a1ca0d51a3b909c80e7d48
INFO: Creating SIF file...
FATAL: While performing build: while creating SIF: while creating container: open /tmp/check-505614902/43/by_uuid/zzzzz-4zz18-4al6gql7va4r8fz/image.sif: no such file or directory
singularity_test.go:193:
c.Check(err, IsNil)
... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc002079e60), Stderr:[]uint8(nil)} ("exit status 255")
singularity_test.go:197:
s.checkCacheCollectionExists(c, setup)
singularity_test.go:180:
if !c.Check(cl.Items, HasLen, 1) {
...
}
... obtained []arvados.Collection = []arvados.Collection{}
... n int = 1
Traceback (most recent call last):
File "/home/jenkins/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk
content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ)
ValueError: cannot mmap an empty file
Traceback (most recent call last):
File "/home/jenkins/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk
content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ)
ValueError: cannot mmap an empty file
At first this happened alongside out of space errors, and we increased the disk size for Jenkins worker nodes to deal with that. See #22579#note-7. But this specific error is still occurring. It is now occurring alongside ENOMEM errors. run-tests-doc-pysdk-api-fuse: #757 :
__________________ ERROR at setup of FuseMountTest_0.runTest ___________________
cls = <class 'tests.test_mount.FuseMountTest_0'>
@classmethod
def setUpClass(cls):
if cls.disk_cache:
cls._disk_cache_dir = tempfile.mkdtemp(prefix='MountTest-')
else:
cls._disk_cache_dir = None
> cls._keep_block_cache = arvados.keep.KeepBlockCache(
disk_cache=cls.disk_cache,
disk_cache_dir=cls._disk_cache_dir,
)
tests/mount_test_base.py:38:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/jenkins/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/keep.py:189: in __init__
self._cache = diskcache.DiskCacheSlot.init_cache(self._disk_cache_dir, self._max_slots)
/home/jenkins/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/_internal/diskcache.py:211: in init_cache
ds.set(b'a')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <arvados._internal.diskcache.DiskCacheSlot object at 0x7f2ba89009c0>
value = b'a'
def set(self, value):
tmpfile = None
try:
if value is None:
self.content = None
self.ready.set()
return False
if len(value) == 0:
# Can't mmap a 0 length file
self.content = b''
self.ready.set()
return True
if self.content is not None:
# Has been set already
self.ready.set()
return False
blockdir = os.path.join(self.cachedir, self.locator[0:3])
os.makedirs(blockdir, mode=0o700, exist_ok=True)
final = os.path.join(blockdir, self.locator) + cacheblock_suffix
self.filehandle = tempfile.NamedTemporaryFile(dir=blockdir, delete=False, prefix="tmp", suffix=cacheblock_suffix)
tmpfile = self.filehandle.name
os.chmod(tmpfile, stat.S_IRUSR | stat.S_IWUSR)
# aquire a shared lock, this tells other processes that
# we're using this block and to please not delete it.
fcntl.flock(self.filehandle, fcntl.LOCK_SH)
self.filehandle.write(value)
self.filehandle.flush()
os.rename(tmpfile, final)
tmpfile = None
> self.content = mmap.mmap(self.filehandle.fileno(), 0, access=mmap.ACCESS_READ)
E OSError: [Errno 12] Cannot allocate memory
/home/jenkins/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/_internal/diskcache.py:83: OSError
Also:
__________________________ FuseWriteFileTest.runTest ___________________________
self = <tests.test_mount.FuseWriteFileTest testMethod=runTest>
def runTest(self):
collection = arvados.collection.Collection(api_client=self.api)
collection.save_new()
> m = self.make_mount(fuse.CollectionDirectory)
tests/test_mount.py:468:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/mount_test_base.py:86: in make_mount
self.operations = fuse.Operations(
arvados_fuse/__init__.py:624: in __init__
self.inodes = Inodes(inode_cache, encoding=encoding, fsns=fsns,
arvados_fuse/__init__.py:317: in __init__
self._inode_remove_thread.start()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Thread(Thread-79, initial daemon)>
def start(self):
"""Start the thread's activity.
It must be called at most once per thread object. It arranges for the
object's run() method to be invoked in a separate thread of control.
This method will raise a RuntimeError if called more than once on the
same thread object.
"""
if not self._initialized:
raise RuntimeError("thread.__init__() not called")
if self._started.is_set():
raise RuntimeError("threads can only be started once")
with _active_limbo_lock:
_limbo[self] = self
try:
> _start_new_thread(self._bootstrap, ())
E RuntimeError: can't start new thread
/usr/lib/python3.9/threading.py:874: RuntimeError
This seems to basically only affect arv-mount. It feels like there's some error happening earlier that gets ignored/suppressed/whatever, and then it ends up failing this way. Looking for the "original" error might be productive.
Files
Updated by Brett Smith about 1 year ago
For now as a mitigation I have configured Jenkins test nodes to be oneshot nodes; i.e., they will not be reused.
With that change, I have reverted the disk size increase #22579#note-7 and set that back to 40, since the main (only?) reason to increase the disk was to accommodate more reuse.
Updated by Brett Smith 12 months ago
Peter Amstutz wrote in #note-2:
I think this is fixed? At least mitigated? I'm marking it as fixed, reopen it if I'm wrong.
It seems to have been mitigated by the use-each-test-node-only-once Jenkins configuration change.
Updated by Brett Smith 10 months ago
INFO: Starting build... Getting image source signatures Copying blob sha256:67f770da229bf16d0c280f232629b0c1f1243a884df09f6b940a1c7288535a6d Copying config sha256:a11e762410a6fb4e925d1ea535fecc177d983bdf0dba3261d244fb3c7ee18865 Writing manifest to image destination Storing signatures 2025/06/03 12:19:13 info unpack layer: sha256:378e3b9fb50c743e1daa7a79dc2cf7c18aa0ac8137a1ca0d51a3b909c80e7d48 INFO: Creating SIF file... INFO: Build complete: /tmp/check-2651351821/20/by_uuid/zzzzz-4zz18-m8kbthvd6gulypv/image.sif saved converted image in zzzzz-4zz18-altwx4pjrdn5uvc with PDH 69572388f903b7ce4e32d7ef7477e7f6+187 saved converted image in zzzzz-4zz18-cr48poogcnnptsl with PDH c7e4f9aea6df212bf1ba6c53e8924d61+187 lost race -- abandoning our conversion in zzzzz-4zz18-altwx4pjrdn5uvc (69572388f903b7ce4e32d7ef7477e7f6+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead lost race -- abandoning our conversion in zzzzz-4zz18-egrvy8o5qtsfrtw (6831e71c64243add0c84825cbee38258+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead lost race -- abandoning our conversion in zzzzz-4zz18-cr48poogcnnptsl (c7e4f9aea6df212bf1ba6c53e8924d61+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead saved converted image in zzzzz-4zz18-7x91u00dwrtedr2 with PDH 599b7040c44df76505b11c9c86fc3713+187 saved converted image in zzzzz-4zz18-xmudpy9gwtpbt34 with PDH b249bf80ba3bda77aa1a84370ee2babb+187 saved converted image in zzzzz-4zz18-00knzkmpygnkox5 with PDH 64a8f3a6822cc0b0c3ba2c62d5cedba3+187 lost race -- abandoning our conversion in zzzzz-4zz18-xmudpy9gwtpbt34 (b249bf80ba3bda77aa1a84370ee2babb+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead lost race -- abandoning our conversion in zzzzz-4zz18-7x91u00dwrtedr2 (599b7040c44df76505b11c9c86fc3713+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead lost race -- abandoning our conversion in zzzzz-4zz18-00knzkmpygnkox5 (64a8f3a6822cc0b0c3ba2c62d5cedba3+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead saved converted image in zzzzz-4zz18-0qo8q7occtqwf04 with PDH 4c9424a9629e13bdc637eae43c023108+187 lost race -- abandoning our conversion in zzzzz-4zz18-0qo8q7occtqwf04 (4c9424a9629e13bdc637eae43c023108+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead saved converted image in zzzzz-4zz18-m8kbthvd6gulypv with PDH 24ebe327e7efd36a1746be684d16e314+187 lost race -- abandoning our conversion in zzzzz-4zz18-m8kbthvd6gulypv (24ebe327e7efd36a1746be684d16e314+187) and using image from zzzzz-4zz18-2vbj2rsk375safh (4492629d0176a1ae5564dc2d2cef2080+187) instead singularity_test.go:246: s.testImageCache(c, 10) singularity_test.go:282: c.Check(fusepath, Equals, fusepath0) ... obtained string = "" ... expected string = "/by_id/4492629d0176a1ae5564dc2d2cef2080+187/image.sif" Traceback (most recent call last): File "/home/jenkins/tmp/VENV3DIR/lib/python3.11/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: cannot mmap an empty file Traceback (most recent call last): File "/home/jenkins/tmp/VENV3DIR/lib/python3.11/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: cannot mmap an empty file fusermount: entry for /tmp/check-2651351821/39 not found in /etc/mtab fusermount: entry for /tmp/check-2651351821/38 not found in /etc/mtab Traceback (most recent call last): File "/home/jenkins/tmp/VENV3DIR/lib/python3.11/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: cannot mmap an empty file fusermount: entry for /tmp/check-2651351821/43 not found in /etc/mtab Traceback (most recent call last): File "/home/jenkins/tmp/VENV3DIR/lib/python3.11/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: cannot mmap an empty file fusermount: entry for /tmp/check-2651351821/46 not found in /etc/mtab OOPS: 114 passed, 3 skipped, 1 FAILED --- FAIL: TestCrunchExec (96.22s) FAIL coverage: 65.3% of statements FAIL git.arvados.org/arvados.git/lib/crunchrun 96.235s FAIL ======= lib/crunchrun tests -- FAILED
Updated by Brett Smith 5 months ago
- Status changed from Resolved to New
While developing new FUSE tests, I got it to fail like this:
2025-10-22 10:12:51 arvados.arv-mount[216446] INFO: /home/brett/Curii/arvados/services/fuse/bin/arv-mount 3.2.0.dev20251020195646 started
2025-10-22 10:12:51 arvados.arv-mount[216446] INFO: enable write is False
2025-10-22 10:12:51 arvados.arv-mount[216446] INFO: file cache capped at 1099511627776 bytes or less based on available disk (RLIMIT_NOFILE is (131072, 1048576))
Traceback (most recent call last):
File "/home/brett/.cache/arvados-test/VENV3DIR/lib/python3.11/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk
content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot mmap an empty file
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
2025-10-22 10:12:51 arvados.arv-mount[216446] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/home/brett/.cache/arvados-test/VENV3DIR/lib/python3.11/site-packages/arvados_fuse/command.py", line 694, in _run_standalone
with self:
File "/home/brett/.cache/arvados-test/VENV3DIR/lib/python3.11/site-packages/arvados_fuse/command.py", line 417, in __enter__
llfuse.init(self.operations, str(self.args.mountpoint), self._fuse_options())
File "src/fuse_api.pxi", line 242, in llfuse.init
RuntimeError: fuse_mount failed
So it looks like this problem may be caused by trying to mount to a non-empty directory, which can be caused by tests doing writes before arv-mount actually starts the mount.