Project

General

Profile

Actions

Bug #23196

closed

Should track instance capacity state separately for preemptible and non-preemptible instance types

Added by Tom Clegg 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Dispatchers
Target version:
Story points:
-
Release relationship:
Auto

Description

Currently, if arvados-dispatch-cloud gets a capacity error trying to create a preemptible a1.large instance, it will avoid trying to create preemptible or non-preemptible a1.large instances for one minute.

"At capacity" state should be tracked separately for preemptible and non-preemptible instance types.


Subtasks 1 (0 open1 closed)

Task #23197: Review 23196-preemptible-capacityResolvedTom Clegg10/14/2025Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #22017: a-d-c needs to handle different quotas for difference instance typesResolvedTom CleggActions
Related to Arvados - Bug #23178: a-d-c is treating rate limit errors as capacity issues, should notResolvedTom CleggActions
Actions #1

Updated by Tom Clegg 6 months ago

23196-preemptible-capacity @ 895af9eaf2a9c9c0bb5ccfd5dc2e70846e5ad319 -- developer-run-tests: #4902

(workbench tests failed)

Actions #2

Updated by Tom Clegg 6 months ago

  • Related to Bug #22017: a-d-c needs to handle different quotas for difference instance types added
Actions #3

Updated by Tom Clegg 6 months ago

  • Related to Bug #23178: a-d-c is treating rate limit errors as capacity issues, should not added
Actions #4

Updated by Brett Smith 6 months ago

Tom Clegg wrote in #note-1:

23196-preemptible-capacity @ 895af9eaf2a9c9c0bb5ccfd5dc2e70846e5ad319 -- developer-run-tests: #4902

This is fine. I do wonder if it would be nicer for maintainability to add a method to arvados.InstanceType that returns a unique key on all the axes we care about. That way we could extend the logic in the future if needed without tracking down all the places the logic is duplicated here. But I don't feel too strongly about it at this point.

(workbench tests failed)

Known issue #23180.

Actions #5

Updated by Tom Clegg 6 months ago

  • Status changed from In Progress to Resolved
Actions #6

Updated by Tom Clegg 6 months ago

  • Status changed from Resolved to In Progress
Actions #7

Updated by Tom Clegg 6 months ago

Previous commit fixed worker pool, but scheduler also has its own cache, which was still conflating preemptible and non-preemptible types.

23196-preemptible-capacity @ 25d5c32f947fc5115be9d348413af5ccc90f48fe -- developer-run-tests: #4904

Actions #8

Updated by Tom Clegg 6 months ago

  • Subtask #23197 added
Actions #9

Updated by Brett Smith 6 months ago

Tom Clegg wrote in #note-7:

23196-preemptible-capacity @ 25d5c32f947fc5115be9d348413af5ccc90f48fe -- developer-run-tests: #4904

Assuming tests pass, LGTM.

Actions #10

Updated by Tom Clegg 6 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF