Project

General

Profile

Actions

Bug #22024

closed

Autoscale installer configures Docker to use volume without necessarily waiting for it

Added by Brett Smith over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
-
Story points:
-

Description

In source:tools/compute-images/scripts/usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh:

  1. We stop Docker if it's currently running.
  2. We set up the EBS autoscaler.
  3. We configure Docker to use a directory under the EBS autoscale volume.
  4. We start Docker if it was running previously.

We have seen at least one process where Docker starts before the autoscale volume is mounted. Docker ends up using the compute node root partition for storage, which is likely to lead to ENOSPC situations.

Ensure that Docker waits for the intended volume to be mounted before starting. It would be ideal to do this with systemd configuration and let it take care of all the details of ordering and waiting.


Subtasks

Actions #2

Updated by Brett Smith over 1 year ago

  • Status changed from New to In Progress

Something funky is going on. The amazon-ebs install script does all the setup through creating and mounting the filesystem. It does this all synchronously and with set -e. So the ordering seems like it should already be in place. Which suggests there's a reason why we've been using this stack for so long without seeing this issue.

Might be a configuration issue. Might be a subtle bug where install.sh is masking an error despite set -e.

Actions #3

Updated by Brett Smith over 1 year ago

One possibility is that the script that calls usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh doesn't run with set -e, so the script is erroring out correctly but then node setup continues as if nothing happened.

Actions #4

Updated by Brett Smith over 1 year ago

Brett Smith wrote in #note-3:

One possibility is that the script that calls usr-local-bin-ensure-encrypted-partitions-aws-ebs-autoscale.sh doesn't run with set -e, so the script is erroring out correctly but then node setup continues as if nothing happened.

This seems to be the case. We run the script from InstanceInitCommand. crunch runs this script with plain /bin/sh with no options. See the handling of UserData in ec2InstanceSet.Create in source:lib/cloud/ec2/ec2.go.

Up for debate what the "right" fix for this is, but for now I'm proposing a fix in the user's configured InstanceInitCommand.

Actions #5

Updated by Lucas Di Pentima over 1 year ago

PR LGTM.

Note that there's currently an issue with passenger's package repository that makes the deployment fail.

Actions #6

Updated by Brett Smith over 1 year ago

  • Status changed from In Progress to Resolved

Lucas Di Pentima wrote in #note-5:

Note that there's currently an issue with passenger's package repository that makes the deployment fail.

That is not the problem. See #22033.

Actions

Also available in: Atom PDF