Bug #7444
Updated by Brett Smith over 9 years ago
We use @docker run --rm@ to ensure that Docker containers are removed after tasks are finished, to prevent compute Compute nodes from filling up with unused volumes. However, "@docker run --rm@ is handled by on tb05z were taken out of rotation because the Docker client":https://github.com/docker/docker/issues/16575. slurmd spool directory was full. It simply makes the necessary API calls to remove the container after it exits. Crunch's cancel code kills the Docker client. If a user cancels a job, the container will hang around, along with its volumes. We just had a situation where compute nodes on a cluster filled their @/tmp@ partitions, because a user was canceling many jobs, leaving it mostly full of finished VFS directories for Docker containers and their large tmp volumes. that were no longer running. Make sure that when Crunch cancels a job, <pre>compute0.tb05z# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4e236764f52c d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83:latest "stdbuf --output=0 - 18 hours ago Exited (1) 18 hours ago elated_wozniak b904e1f16f1f 1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest "stdbuf --output=0 - 18 hours ago nostalgic_franklin 59fec9bcda68 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 19 hours ago Up 19 hours goofy_wilson 6fe7c70100ef 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 20 hours ago Exited (1) 19 hours ago mad_albattani d1f80841ca42 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 20 hours ago Exited (1) 20 hours ago hopeful_ritchie a366129c6a1b 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 20 hours ago Exited (1) 20 hours ago angry_cori dbb29b69f7a3 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 20 hours ago Exited (1) 20 hours ago elegant_fermat b83c876b8ecf 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 22 hours ago Exited (1) 20 hours ago modest_bell fff2c5d781ec 1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest "stdbuf --output=0 - 22 hours ago Exited (127) 22 hours ago stoic_feynman 5a4fc1333fef d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83:latest "stdbuf --output=0 - 23 hours ago Exited (1) 22 hours ago hopeful_hodgkin a69f0ff99682 d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83:latest "stdbuf --output=0 - 42 hours ago Exited (1) 41 hours ago admiring_hoover 48bde55948f1 998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest "stdbuf --output=0 - 46 hours ago Exited (1) 46 hours ago stupefied_kowalevski b62f346c8e85 1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest "stdbuf --output=0 - 2 days ago Exited (1) 2 days ago loving_mcclintock 12f20e821bbd b85dffb1be2ca7bc757be6ff8ae4873a45214918282ef42cc2cbc2cead63356b:latest "stdbuf --output=0 - 4 days ago Exited (1) 3 days ago loving_bell 7b6ed97e23ae 1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest "stdbuf --output=0 - 4 days ago Exited (1) 4 days ago determined_pasteur e258841ffcf1 1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest "stdbuf --output=0 - 6 days ago Exited (1) 6 days ago furious_leakey 3109f9488c66 b85dffb1be2ca7bc757be6ff8ae4873a45214918282ef42cc2cbc2cead63356b:latest "stdbuf --output=0 - 8 days ago Exited (1) 8 days ago fervent_thompson 164c4d49e8ce 1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest "stdbuf --output=0 - 10 days ago Exited (1) 10 days ago sad_wozniak 242c764bfe5d 882cc785701a5d3a20d5fa5e244d22beb09e4861b4f1c654867f0ca0c154b029:latest "stdbuf --output=0 - 12 days ago Exited (1) 12 days ago sharp_archimedes d83d9e200705 b85dffb1be2ca7bc757be6ff8ae4873a45214918282ef42cc2cbc2cead63356b:latest "stdbuf --output=0 - 13 days ago Exited (1) 13 days ago modest_mccarthy </pre> Why weren't these containers removed? crunch-job on the corresponding cluster is new enough to use --rm, and Docker container is removed. new enough to respect it (1.6.0). The fact that these containers exited 1 doesn't seem to explain it, either: <pre>brinstar % docker version Client version: 1.6.0 Client API version: 1.18 Go version (client): go1.4.2 Git commit (client): 4749651 OS/Arch (client): linux/amd64 Server version: 1.6.0 Server API version: 1.18 Go version (server): go1.4.2 Git commit (server): 4749651 OS/Arch (server): linux/amd64 brinstar % docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES brinstar % docker run --rm=true debian:wheezy /bin/false brinstar % docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES brinstar % </pre>