Bug #12991
closed[crunch2] Propagate memory limit from runtime constraints to docker container
100%
Description
Current behavior: When a container tries to use more memory than it asked for, it competes with system processes, and the kernel OOM-killer sometimes kills system processes instead of the container.
Desired behavior: when a container tries to allocate more memory than specified in runtime_constraints, allocation fails and/or the container is killed. System processes (including crunch-run and slurmd) are not killed.
Explanation: We use the memory and cpu figures in container runtime_constraints to choose an appropriate node to run a container on (even taking kernel/system overhead into account), but we don't tell docker to limit the the container's memory use.
Proposed solution: We have an opportunity to do this in source:services/crunch-run/crunchrun.go L918:
Resources: dockercontainer.Resources{
CgroupParent: runner.setCgroupParent,
},
(dockercontainer.Resources also has Memory and NanoCPUs fields)
The container's memory size (including swap) should be limited to the number of bytes given in runtime_constraints.
Updated by Peter Amstutz almost 7 years ago
For reference, crunch-job (crunch1) logic takes MemTotal from /proc/meminfo and deflates it by 5% to get MEMLIMIT used for the container. It also sets SWAPLIMIT=MEMLIMIT+SWAP
I feel like it was important to set the swap limit along with the memory limit, but I don't quite remember the details now.
The right thing to do is probably to set the memory limit to the actual amount requested by the user. This would discourage behavior such as asking for 4 GB, getting a 7GB node, and actually using 6 GB of memory.
Updated by Tom Clegg almost 7 years ago
Peter Amstutz wrote:
I feel like it was important to set the swap limit along with the memory limit, but I don't quite remember the details now.
ISTR "limit memory to X" is implemented as if swap space is free/unlimited, so (if we don't also limit swap) a 1G-limited container can allocate 2G of memory as long as 1G can be swapped to disk.
We want allocations above 1G to fail.
The right thing to do is probably to set the memory limit to the actual amount requested by the user. This would discourage behavior such as asking for 4 GB, getting a 7GB node, and actually using 6 GB of memory.
Yes, absolutely. Avoiding OOM-kills is important, but this is also about making the container's runtime environment as predictable as possible, so "no OOM-kill today" is a better predictor of "no OOM-kill tomorrow".
Updated by Tom Morris almost 7 years ago
- Target version set to Arvados Future Sprints
- Story points set to 1.0
Updated by Tom Morris almost 7 years ago
- Target version changed from Arvados Future Sprints to 2018-02-14 Sprint
Updated by Tom Clegg almost 7 years ago
12991-docker-memory-limit @ 2bae361432ff8f975803495b4fd0acfae02cd3ef
Updated by Lucas Di Pentima almost 7 years ago
This LGTM, thanks.
Test run successful: https://ci.curoverse.com/job/developer-run-tests/566/
Updated by Anonymous almost 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|f309b872cc064141d9fbf43a15067998603e2204.