Project

General

Profile

Actions

Bug #17529

closed

[a-d-c] AWS/EC2 driver should return a RateLimitError to dispatcher if the upstream error is RequestLimitExceeded

Added by Tom Clegg over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Start date:
04/14/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Current code results in error logs like this:

{"InstanceType":"c5large.spot","PID":2231,"error":"RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 778d5b22-90e4-4a48-8f09-6946e8edcb2c","level":"error","msg":"create failed","time":"2021-03-30T13:38:38.925565910Z"}

...but the returned error does not implement the lib/cloud.RateLimitError interface, so the dispatcher doesn't back off.


Subtasks 1 (0 open1 closed)

Task #17545: Review 17529-ec2-rate-limitResolvedTom Clegg04/14/2021

Actions

Related issues 1 (1 open0 closed)

Related to Arvados - Bug #17561: [arvados-dispatch-cloud] inst.SetTags() and inst.Destroy() should respect rate-limiting responses from cloud providerNew

Actions
Actions #1

Updated by Tom Clegg over 3 years ago

  • Assigned To set to Tom Clegg
Actions #2

Updated by Tom Clegg over 3 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

17529-ec2-rate-limit @ ff11506c916cb2cd8abd1905e16c4d4f5ddd4240 -- developer-run-tests: #2412

Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented

This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?

Otherwise, LGTM, thanks.

Actions #4

Updated by Tom Clegg over 3 years ago

Ward Vandewege wrote:

Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented

Done with refactor.

This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?

Create and Instances() are currently the only methods whose caller (in a-d-c) will actually notice the don't-retry-until advice. Updated branch so Instances() is covered too.

We could also handle SetTags and Destroy (perhaps even reducing MaxCloudOpsPerSecond on the fly as well as logging something) ... but that would be a somewhat bigger fix, and (I suspect) not quite as important practically speaking.

That said, I noticed SetTags() wasn't even obeying MaxCloudOpsPerSecond in a-d-c, and that was trivial to fix.

17529-ec2-rate-limit @ 47cbcd47789be77f3a1c44ba605853f50c448e8a -- developer-run-tests: #2419

Actions #5

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

Ward Vandewege wrote:

Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented

Done with refactor.

This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?

Create and Instances() are currently the only methods whose caller (in a-d-c) will actually notice the don't-retry-until advice. Updated branch so Instances() is covered too.

We could also handle SetTags and Destroy (perhaps even reducing MaxCloudOpsPerSecond on the fly as well as logging something) ... but that would be a somewhat bigger fix, and (I suspect) not quite as important practically speaking.

That said, I noticed SetTags() wasn't even obeying MaxCloudOpsPerSecond in a-d-c, and that was trivial to fix.

17529-ec2-rate-limit @ 47cbcd47789be77f3a1c44ba605853f50c448e8a -- developer-run-tests: #2419

Cool, thanks for that additional fix in SetTags(). This LGTM; should we create a follow-on ticket to make a-d-c notice the don't-retry-until behavior for SetTags() and Destroy(), too (and the corresponding changes in the cloud drivers) ?

Actions #6

Updated by Tom Clegg over 3 years ago

  • Related to Bug #17561: [arvados-dispatch-cloud] inst.SetTags() and inst.Destroy() should respect rate-limiting responses from cloud provider added
Actions #7

Updated by Tom Clegg over 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #8

Updated by Peter Amstutz over 3 years ago

  • Release set to 38
Actions

Also available in: Atom PDF