Project

General

Profile

Story #7478

Updated by Tom Morris about 7 years ago

Functional requirements: 

 * Requests Bids for spot instances, waits for those requests to be fulfilled (minutes?) and launches the instances those as compute nodes. 
 ** We don't necessarily need to write our own bidding code.    This might be done by coordinating with another service or library. 
 * For the initial implementation, just Doesn't bid over the standard non-spot price rather than trying for the smallest size necessary to design run a fancy bidding strategy. We'll still get the cost benefit as long as the spot price is lower. job. 
 * When I expect that when Amazon shuts down a spot instance, SLURM or other underlying infrastructure recognizes this as a node failure, and causes the bid price is exceeded (hopefully rarely/never), we're likely task to lose our entire fleet of compute instances and, perhaps, not be able to start any until demand subsides enough to cause the spot prices to go down. In this scenario, retried.    Test that assumption.    If it's wrong, we'll need some configuration knobs a separate, related story to control whether to fall back to on-demand instances, wait for spot instances to become available again, etc. 

 The libcloud doesn't EC2 driver doesn't appear to support spot instances, so the first step is probably to enhance libcloud. 
 An earlier prototype of ensure that this situation is here: https://github.com/muccg/libcloud-drivers recognized and we recover from it.

Back