Feature #14325
Updated by Tom Clegg about 6 years ago
This issue covers the smallest version that can be deployed on a dev cluster. Requirements: * One cloud vendor driver (Azure = #14324) * Bring up nodes and run containers on them * Ops mechanism for draining a node (e.g., curl command using a management token) * HTTP status report with current set of containers (queued/running) and VMs (busy/idle) -- see [[Dispatching containers to cloud VMs#Operator view]] "Operator view" * Structured logs for diagnostics+statistics: cloud API errors, node lifecycle, container lifecycle * Resource consumption metrics (instances running/allocated, hourly cost) * Shutdown idle nodes automatically * Handle cloud API quota/ratelimit errors * Cancel containers that can't be scheduled Non-requirements: * Multiple cloud drivers * Test suite that uses a real cloud provider * Performance metrics Metrics * Periodic status reports in logs * Optimize worker VM deployment (for now, we still expect the operator to provide an image with a suitable version of crunch-run) * Configurable spending limits Refs * [[Dispatching containers to cloud VMs]] * #13964 spike