Story #7478
Updated by Tom Morris over 6 years ago
Functional requirements:
* Requests spot instances, waits for those requests to be fulfilled (minutes?) and launches the instances as compute nodes.
* For the initial implementation, just bid the standard price rather than trying to design a fancy bidding strategy. We'll still get the cost benefit as long as the spot price is lower.
* When the bid price is exceeded (hopefully rarely/never), we're likely to lose our entire fleet of compute instances and, perhaps, not be able to start any until demand subsides enough to cause the spot prices to go down. In this scenario, we'll need some configuration knobs to control whether to fall back to on-demand instances, wait for spot instances to become available again, etc.
Implementation details:
* Enhance libcloud to support AWS spot instances.
** There’s an earlier prototype that can be useful: https://github.com/muccg/libcloud-drivers
** Spot API similar to On Demand API: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RequestSpotInstances.html & https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html
* API server will have a config option which specifies whether spot instances are enabled or not. If they are enabled, child containers will get created with the spot instances scheduling parameter set.
* Spot instances will be their own instance type. Node manager needs to manage instance types separately from the libcloud-specified instance type that it currently does. Node manager will use the new libcloud support to request spot instances when needed. No arvados-cwl-runner required.