Project

General

Profile

Actions

Bug #14009

closed

container request creation very slow when there are many potentially reusable containers

Added by Joshua Randall over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
08/24/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release:
Release relationship:
Auto

Description

Container request creation on our system was typically taking 90-120s (N.B. this is after disabling all audit logging so it has already been sped up dramatically from stock). We found this to be due to the queries that look up the reusable containers were the query conditions were pretty much all on columns that have no indices of any kind.

Our interim solution has been to add a hash-based index (to keep index size down) on the `command` column:

arvados_api_production=# create index index_containers_on_command_hash on containers using hash (command);

After adding this index, container request creation was sped up by ~20x (to around 5s each).

However, note that hash indices are not recommended for postgres versions < 10 as they are not included in WAL replication. An alternative would be to have explicit hash columns included in the table (which are btree indexed) that can be used to query the exact-match fields for container reuse.

In addition to making for faster reuse queries, this should also pave the way to enable formulation of client-side container queries that don't need to transmit the large text values (such as mounts, command, etc) to the API server.


Subtasks 1 (0 open1 closed)

Task #14046: Review 14009-container-reuse-indexResolvedTom Clegg08/24/2018

Actions
Actions #1

Updated by Tom Morris over 6 years ago

  • Target version set to 2018-09-05 Sprint
Actions #2

Updated by Tom Clegg over 6 years ago

  • Assigned To set to Tom Clegg
Actions #3

Updated by Tom Clegg over 6 years ago

  • Status changed from New to In Progress

14009-container-reuse-index @ 6525b509825dbbf1cbe8b30b34080aafc4e5bde3

The test database is so small postgres declines to use the index, but I pasted the "create index" statement and a reuse query from the test suite into a dev cluster with lots of containers (9tee4) to confirm postgres uses it. The query took ~130ms before adding the index, ~1ms after.

Actions #4

Updated by Peter Amstutz over 6 years ago

Tom Clegg wrote:

14009-container-reuse-index @ 6525b509825dbbf1cbe8b30b34080aafc4e5bde3

The test database is so small postgres declines to use the index, but I pasted the "create index" statement and a reuse query from the test suite into a dev cluster with lots of containers (9tee4) to confirm postgres uses it. The query took ~130ms before adding the index, ~1ms after.

This seems like the sort of thing a hash index would be useful for, but Postgres 9.x hash indexes come with a recommendation not to use them, so this seems to be the next best thing. LGTM.

Actions #5

Updated by Tom Clegg over 6 years ago

  • Status changed from In Progress to Resolved
Actions #6

Updated by Tom Morris about 6 years ago

  • Release set to 13
Actions

Also available in: Atom PDF