Bug #18686
Updated by Ward Vandewege about 3 years ago
This is on Tordo (AWS). A compute node has been running for over 2 days: <pre> root@ip-10-253-254-49:/home/admin# uptime 13:05:39 up 2 days, 12 min, 1 user, load average: 0.02, 0.03, 0.00 </pre> The AWS ID is i-06cce6b3e0448b10d at 10.253.254.49. The a-d-c logs say: <pre> tordo:~# journalctl -u arvados-dispatch-cloud.service -n100000|grep i-06cce6b3e0448b10d Jan 25 12:53:02 tordo.arvadosapi.com arvados-dispatch-cloud[15971]: {"Address":"10.253.254.49","ClusterID":"tordo","IdleBehavior":"run","Instance":"i-06cce6b3e0448b10d","InstanceType":"t3small","PID":15971,"State":"booting","level":"info","msg":"instance appeared in cloud","time":"2022-01-25T12:53:02.659570016Z"} Jan 25 12:53:38 tordo.arvadosapi.com arvados-dispatch-cloud[15971]: {"Address":"10.253.254.49","ClusterID":"tordo","Command":"sudo docker ps -q","Instance":"i-06cce6b3e0448b10d","InstanceType":"t3small","PID":15971,"level":"info","msg":"boot probe succeeded","stderr":"","stdout":"","time":"2022-01-25T12:53:38.438366587Z"} Jan 25 12:53:38 tordo.arvadosapi.com arvados-dispatch-cloud[15971]: {"Address":"10.253.254.49","ClusterID":"tordo","Instance":"i-06cce6b3e0448b10d","InstanceType":"t3small","PID":15971,"cmd":"sudo sh -c 'set -e; dstdir=\"/var/lib/arvados/\"; dstfile=\"/var/lib/arvados/crunch-run~89bea761309098144f11941ae52673f2\"; mkdir -p \"$dstdir\"; touch \"$dstfile\"; chmod 0755 \"$dstdir\" \"$dstfile\"; cat \u003e\"$dstfile\"'","hash":"89bea761309098144f11941ae52673f2","level":"info","msg":"installing runner binary on worker","path":"/var/lib/arvados/crunch-run~89bea761309098144f11941ae52673f2","time":"2022-01-25T12:53:38.471727831Z"} Jan 25 12:53:39 tordo.arvadosapi.com arvados-dispatch-cloud[15971]: {"Address":"10.253.254.49","ClusterID":"tordo","Instance":"i-06cce6b3e0448b10d","InstanceType":"t3small","PID":15971,"ProbeStart":"2022-01-25T12:53:30.678564128Z","level":"info","msg":"instance booted; will try probeRunning","time":"2022-01-25T12:53:39.371129567Z"} Jan 25 12:53:39 tordo.arvadosapi.com arvados-dispatch-cloud[15971]: {"Address":"10.253.254.49","ClusterID":"tordo","Instance":"i-06cce6b3e0448b10d","InstanceType":"t3small","PID":15971,"ProbeStart":"2022-01-25T12:53:30.678564128Z","RunningContainers":0,"State":"idle","level":"info","msg":"probes succeeded, instance is in service","time":"2022-01-25T12:53:39.429678726Z"} Jan 25 12:53:39 tordo.arvadosapi.com arvados-dispatch-cloud[15971]: {"Address":"10.253.254.49","ClusterID":"tordo","ContainerUUID":"tordo-dz642-2l2xwswnvbvc8gk","Instance":"i-06cce6b3e0448b10d","InstanceType":"t3small","PID":15971,"level":"info","msg":"crunch-run process started","time":"2022-01-25T12:53:39.471920534Z"} </pre> On the node itself, @crunch-run@ and @arv-mount@ are running but nothing more: <pre> root 18119 0.0 0.1 12512 3212 pts/0 R+ 13:07 0:00 \_ ps auxwf admin 631 0.0 0.4 21140 8976 ? Ss Jan25 0:00 /lib/systemd/systemd --user admin 632 0.0 0.1 104852 2364 ? S Jan25 0:00 \_ (sd-pam) root 1148 0.1 2.5 1333156 50724 ? Sl Jan25 5:24 /var/lib/arvados/crunch-run~89bea761309098144f11941ae52673f2 -no-detach --detach --stdin-config --runtime-engine=singularity tordo-dz642-2l2xwswnvbvc8gk root 1161 0.1 2.4 1311492 48800 ? Sl Jan25 4:36 \_ /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 root 1307 0.0 4.3 865924 87296 ? Ssl Jan25 0:37 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --default-ulimit nofile=10000:10000 --dns 10.253.0.2 root 2104 0.0 1.1 32536 22200 ? Ss Jan25 0:00 /usr/share/python3/dist/arvados-docker-cleaner/bin/python /usr/bin/arvados-docker-cleaner </pre> <pre> root@ip-10-253-254-49:/tmp# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES </pre> <pre> root@ip-10-253-254-49:/tmp# v total 4 drwxrwxrwt 4 root root 48 Jan 27 08:25 ./ drwxr-xr-x 18 root root 4096 Jan 25 12:53 ../ drwx--x--x 14 root root 182 Jan 25 12:53 docker-data/ drwxr-xr-x 2 root root 18 Jan 25 12:54 hsperfdata_root/ </pre> The container (tordo-dz642-2l2xwswnvbvc8gk) is just one part of our standard test suite: <pre> { "uuid": "tordo-dz642-2l2xwswnvbvc8gk", "owner_uuid": "tordo-tpzed-000000000000000", "created_at": "2022-01-25T12:53:29.978Z", "modified_at": "2022-01-27T12:58:31.556Z", "modified_by_client_uuid": "tordo-ozdt8-q6dzdi1lcc03155", "modified_by_user_uuid": "tordo-tpzed-000000000000000", "state": "Locked", "started_at": null, "finished_at": null, "log": "58d5e0dd2f63cc85d9146130dc10c54c+13282", "environment": { "HOME": "/var/spool/cwl", "TMPDIR": "/tmp" }, "cwd": "/var/spool/cwl", "command": [ "/bin/sh", "-c", "echo \"HOME=$HOME\" \"TMPDIR=$TMPDIR\" && test \"$HOME\" = /var/spool/cwl -a \"$TMPDIR\" = /tmp" ], "output_path": "/var/spool/cwl", "mounts": { "/tmp": { "capacity": 1073741824, "kind": "tmp" }, "/var/spool/cwl": { "capacity": 1073741824, "kind": "tmp" } }, "runtime_constraints": { "API": false, "cuda": { "device_count": 0, "driver_version": "", "hardware_capability": "" }, "keep_cache_ram": 268435456, "ram": 268435456, "vcpus": 1 }, "output": null, "container_image": "021e994505b006982494a7caf0cedd1d+261", "progress": null, "priority": 562948310306102004, "updated_at": null, "exit_code": null, "auth_uuid": "tordo-gj3su-rxkhzwtfimcmuij", "locked_by_uuid": "tordo-gj3su-000000000000000", "scheduling_parameters": { "max_run_time": 0, "partitions": [ ], "preemptible": false }, "runtime_status": { }, "runtime_user_uuid": "ce8i5-tpzed-yzrv3k3xiq86td0", "runtime_auth_scopes": [ "all" ], "runtime_token": null, "lock_count": 1, "gateway_address": null, "interactive_session_started": false, "output_storage_classes": [ "default" ] } </pre> Versions: <pre> root@ip-10-253-254-49:/usr/src# dpkg -l |grep arv ii arvados-docker-cleaner 2.3.0~dev20210729201354-1 amd64 Arvados Docker cleaner ii python3-arvados-fuse 2.4.0~dev20211126162134-1 amd64 Arvados FUSE driver root@ip-10-253-254-49:/usr/src# dpkg -l |grep cru ii crunch-run 2.4.0~dev20211230190012-1 amd64 Supervise a single Crunch container </pre> Clearly we need to rebuild the image to get the latest arvados-docker-cleaner version but the fuse driver is up to date. The arv-mount parent process and two Arv-mount sure has a lot of the (33!) children are stuck on FUTEX_WAIT: threads: <pre> # strace -f -p root@ip-10-253-254-49:/usr/src# ps -efL|grep mount root 1161 1148 1161 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 strace: Process root 1161 attached with 1148 1232 0 34 threads Jan25 ? 00:04:37 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1275] read(5, 1148 <unfinished ...> 1239 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1273] read(5, 1148 <unfinished ...> 1241 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1272] read(5, 1148 <unfinished ...> 1242 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1274] read(5, 1148 <unfinished ...> 1243 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1271] read(5, 1148 <unfinished ...> 1244 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1270] read(5, 1148 <unfinished ...> 1245 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1269] read(5, 1148 <unfinished ...> 1247 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1268] read(5, 1148 <unfinished ...> 1248 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1263] read(5, 1148 <unfinished ...> 1249 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1267] read(5, 1148 <unfinished ...> 1250 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1266] read(5, 1148 <unfinished ...> 1254 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1265] read(5, 1148 <unfinished ...> 1255 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1264] read(5, 1148 <unfinished ...> 1256 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1262] read(5, 1148 <unfinished ...> 1257 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1261] read(5, 1148 <unfinished ...> 1258 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1259] read(5, 1148 <unfinished ...> 1259 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1260] read(5, 1148 <unfinished ...> 1260 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1258] read(5, 1148 <unfinished ...> 1261 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1257] read(5, 1148 <unfinished ...> 1262 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1256] read(5, 1148 <unfinished ...> 1263 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1250] read(5, 1148 <unfinished ...> 1264 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1255] read(5, 1148 <unfinished ...> 1265 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1254] read(5, 1148 <unfinished ...> 1266 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1249] read(5, 1148 <unfinished ...> 1267 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1248] read(5, 1148 <unfinished ...> 1268 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1247] read(5, 1148 <unfinished ...> 1269 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1245] read(5, 1148 <unfinished ...> 1270 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1243] read(5, 1148 <unfinished ...> 1271 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1244] read(5, 1148 <unfinished ...> 1272 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1242] read(5, 1148 <unfinished ...> 1273 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 [pid root 1161 1241] futex(0x7ff9080013f0, 1148 1274 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 root 1161 1148 1275 0 34 Jan25 ? 00:00:00 /usr/share/python3/dist/python3-arvados-fuse/bin/python /usr/bin/arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --file-cache 268435456 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-2l2xwswnvbvc8gk.4167237599/keep3068660272 </pre> The parent process is stuck on a futex: <pre> # strace -p 1161 strace: Process 1161 attached futex(0x7ff904000d50, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished FUTEX_BITSET_MATCH_ANY^Cstrace: Process 1161 detached <detached ...> [pid 1239] </pre> Only two of the 33 child threads are as well (I checked them all): <pre> # strace -p 1239 strace: Process 1239 attached futex(0x7ff911b05170, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished FUTEX_BITSET_MATCH_ANY^Cstrace: Process 1239 detached <detached ...> # strace -p 1241 [pid 1232] select(0, NULL, NULL, NULL, {tv_sec=9, tv_usec=562958} <unfinished ...> strace: Process 1241 attached [pid 1161] futex(0x7ff904000d50, futex(0x7ff9080013f0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY FUTEX_BITSET_MATCH_ANY^Cstrace: Process 1241 detached <detached ...> </pre>