Project

General

Profile

Actions

Bug #17776

closed

[a-d-c] [ec2] when InsufficientInstanceCapacity is returned, we should throttle node creation.

Added by Ward Vandewege almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/10/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Subtasks 1 (0 open1 closed)

Task #17784: review 17776-more-throttlingResolvedWard Vandewege06/10/2021

Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instancesResolvedWard Vandewege

Actions
Related to Arvados - Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attemptsResolvedWard Vandewege

Actions
Actions #1

Updated by Ward Vandewege almost 4 years ago

  • Status changed from New to In Progress
  • Assigned To set to Ward Vandewege
  • Target version changed from To Be Groomed to 2021-06-23 sprint

A very basic approach at 66d3cb88d07eed627903b6db0b1cffb7491d4e34 on branch 17776-more-throttling

Actions #2

Updated by Ward Vandewege almost 4 years ago

  • Related to Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instances added
Actions #3

Updated by Ward Vandewege almost 4 years ago

  • Related to Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attempts added
Actions #4

Updated by Tom Clegg almost 4 years ago

For detecting the error:
  • I don't think we want to export IsErrorCapacity.
  • The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}
For reporting it back to dispatcher:
  • These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.
Actions #5

Updated by Ward Vandewege almost 4 years ago

Tom Clegg wrote:

For detecting the error:
  • I don't think we want to export IsErrorCapacity.
  • The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}

Yes, all fixed, thanks.

For reporting it back to dispatcher:
  • These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.

Thanks! I've updated the branch accordingly. I've also added a basic test for wrapError in the ec2 driver. See 6bb5a84a53e5810e96e56e41cc751d4ebc054580 on branch 17776-more-throttling.

Tests in developer-run-tests: #2527

Actions #6

Updated by Tom Clegg almost 4 years ago

LGTM, thanks!

Actions #7

Updated by Ward Vandewege almost 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #8

Updated by Ward Vandewege almost 4 years ago

  • Release set to 39
Actions

Also available in: Atom PDF