Idea #21996
closedFigure out how to replace the EBS autoscaler
Description
aws-ebs-autoscale is archived and suggests a couple of replacements at the top of the page. We should investigate migrating to one of these. The way the autoscaler sets itself up is just not right (remounting /tmp after boot) and it tends to fail badly when something goes wrong.
Updated by Brett Smith 6 months ago
- Target version deleted (
Future) - Description updated (diff)
- Subject changed from investigate if we should continue setting up encrypted partitions ourselves or can use EBS encryption to Figure out how to replace the EBS autoscaler
Updated by Brett Smith 6 months ago
- Related to Bug #20492: EBS autoscaler possibly not deleting volumes on terminate added
Updated by Brett Smith 6 months ago
- Target version set to Development 2025-10-01
- Assigned To set to Lucas Di Pentima
- Category set to Deployment
Updated by Brett Smith 6 months ago
- Target version changed from Development 2025-10-01 to Development 2025-10-15
Updated by Brett Smith 5 months ago
- Target version changed from Development 2025-10-15 to Development 2025-10-29
Updated by Lucas Di Pentima 5 months ago
Status update¶
I've been reading about the suggested alternatives and none of them completely convince me for different reasons. Here's the summary:
AWS Mountpoint for S3¶
This is an open source client that mounts S3 as a filesystem.
It's not a full-fledged filesystem, it's optimized for read access and single client sequential writes and it has several other restrictions that IMO make it unsuitable as a generic container scratch space filesystem.
Amazon EFS¶
AWS EFS provides an elastic and managed NFS service. It can be created through the API but I'm hesitant to think that a good usage pattern would be to create one EFS per compute node, and as it doesn't have the same capability that EBS volume have (get automatically destroyed when its instance is shut down) it looks to me that the best way of using it would be to share one EFS per VPC or subnet and make it so that each compute node use a different per-container directory or similar.
Pros vs EBS:- Elastic: Storage cost is based on how much data is stored. In the EBS option, we create an initial 200GB volume on every compute node no matter if it'll be used or not.
- Shared: We could use this capability to inspect failed compute node scratch space for debugging purposes when needed, but this also adds some complexity because the EFS instance should be cleaned up to avoid increasing storage costs.
- More costly:
- Storage: $0.30/GB-month (EFS standard) vs $0.08/GB-month (EBS gp3)
- Reads: $0.03/GB (EFS) vs $0/GB (EBS using included 3000 IOPS & 125 MB/s throughput)
- Writes: $0.06/GB (EFS) vs $0/GB (EBS using included 3000 IOPS & 125 MB/s throughput)
- Higher latency: as it's a networked filesystem, it's supposed to have higher latency than EBS (not always the case in my basic tests -- see below)
Here's an official feature comparison between the 3 options from AWS
Basic performance tests¶
I launched a c5.large EC2 instance with a gp3 EBS extra volume & a mounted EFS instance. Using the fio tool to perform tests following recommendations from this Oracle doc page, here're the results:
| EBS throughput (MiB/s) | EFS throughput (MiB/s | EBS latency (msec) | EFS latency (msec) | ||
|---|---|---|---|---|---|
| Sequential | Reads | 126 | 310 | 126 | 51 |
| Writes | 126 | 104 | 126 | 153 | |
| Random | Reads | 8.4 | 18.2 | 84.8 | 32.3 |
| Writes | 3.6 | 8 | 85.5 | 45.6 |
Summary¶
IMO, AWS mountpoint for S3 is not suitable for our use case and EFS is not designed for per-instance elastic scratch space usage. We can opt to use it anyways, but it seems to me that costs would be a lot bigger and probably complexity will be just shifted from properly handling instance launch-time setup issues (in the EBS-autoscale case) to storage and cost control management (in the EFS case). If we think this is a good compromise, I think we can use EFS to replace EBS-autoscale.
There are additional resources required to use EFS (security groups and EFS mount targets, for example) that we could manage with tools like Ansible/Terraform, but then we might need to add capabilities to crunch-run to properly manage the shared storage (for example, to avoid collisions with other running containers) and also do cleanup when the container finishes.
On the preemptible instance use case, there's a high chance of unused data to be left around on the EFS instance, and this should be handled somehow. EFS allows to move old data to cheaper storage classes to lower down storage costs, but access to those cheaper classes is still expensive so the ideal is to keep only data in use.
Personally, I think it would be convenient to analyze if it's better to tackle ebs-autoscale's shortcomings as we have been using it in fairly large scales and even though when it fails, it's difficult to debug... that is unusual occurrence.
Updated by Brett Smith 5 months ago
- Status changed from In Progress to Resolved
Lucas Di Pentima wrote in #note-8:
IMO, AWS mountpoint for S3 is not suitable for our use case and EFS is not designed for per-instance elastic scratch space usage. We can opt to use it anyways, but it seems to me that costs would be a lot bigger and probably complexity will be just shifted from properly handling instance launch-time setup issues (in the EBS-autoscale case) to storage and cost control management (in the EFS case). If we think this is a good compromise, I think we can use EFS to replace EBS-autoscale.
I'm convinced. We talked about two basic tacks from here: instead of having autoscaling storage, we can have the dispatcher request the desired amount of disk when it creates the node (although this could make the node less reusable); or we can just take full ownership of our fork to investigate and fix this bug.
The next step would be to add disk space reporting to the cluster activity report to help us understand usage patterns better and use that to decide what path to take.
At the same time, we can also improve the way we currently deploy the autoscaler. It's supposed to have a separate install step and start step. If we did the install step during image build, and mapped it to a dedicated partition that we used for Docker etc., that should be relatively easy and improve reliability.