Project

General

Profile

Actions

Bug #15902

closed

Very slow query for large workflows

Added by Tom Morris about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
01/28/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

A customer reports that the query below takes 32 seconds for a ~3,000 container workflow:

duration: 32570.970 ms  execute <unnamed>: SELECT  container_requests."uuid", container_requests."owner_uuid", container_requests."created_at", container_requests."modified_at", container_requests."modified_by_client_uuid", container_requests."modified_by_user_uuid", container_requests."name", container_requests."description", container_requests."properties", container_requests."state", container_requests."requesting_container_uuid", container_requests."container_uuid", container_requests."container_count_max", container_requests."runtime_constraints", container_requests."container_image", container_requests."environment", container_requests."cwd", container_requests."command", container_requests."output_path", container_requests."priority", container_requests."expires_at", container_requests."filters", container_requests."container_count", container_requests."use_existing", container_requests."scheduling_parameters", container_requests."output_uuid", container_requests."log_uuid", container_requests."output_name", container_requests."output_ttl" 
FROM "container_requests"
WHERE (container_requests.owner_uuid NOT IN (SELECT target_uuid FROM materialized_permission_view WHERE trashed = 1) ) AND (container_requests.requesting_container_uuid in ('zzzzz-dz642-r9n91uyk20xowyf',...[~3000 UUIDs omitted]
) ORDER BY container_requests.modified_at desc, container_requests.uuid LIMIT $1 OFFSET $2
< 2019-11-26 10:50:27.162 EST >DETAIL: parameters: $1 = '1000', $2 = '0'

Here's the EXPLAIN ANALYZE for the original query:

Limit  (cost=2385.09..16840.05 rows=1000 width=791) (actual time=32385.658..32385.658 rows=0 loops=1)
-> Index Scan using index_container_requests_on_modified_at_uuid on container_requests (cost=2385.09..1995593.21 rows=137891 width=791) (actual time=32385.656..32385.656 rows=0 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND ((requesting_container_uuid)::text = ANY ('{zzzzz-dz642-r9n91uyk20xowyf,...}'::text[])))
Rows Removed by Filter: 376597
SubPlan 1
-> Index Only Scan using permission_target_trashed on materialized_permission_view (cost=0.42..2380.40 rows=1709 width=28) (actual time=0.050..0.351 rows=95 loops=1)
Index Cond: (trashed = 1)
Heap Fetches: 95
Planning time: 29.160 ms
Execution time: 32385.751 ms

Deleting the index that the query planner incorrectly chose, index_container_requests_on_modified_at_uuid, dropped the query time from 30+ seconds to ~50 msec.

Limit  (cost=83535.57..83538.07 rows=1000 width=780) (actual time=54.419..54.419 rows=0 loops=1)
-> Sort (cost=83535.57..83877.22 rows=136657 width=780) (actual time=54.417..54.417 rows=0 loops=1)
Sort Key: container_requests.modified_at DESC, container_requests.uuid
Sort Method: quicksort Memory: 25kB
-> Index Scan using index_container_requests_on_requesting_container_uuid on container_requests (cost=635.83..76042.82 rows=136657 width=780) (actual time=54.323..54.323 rows=0 loops=1)

Before this was done, a full VACUUM ANALYZE had no affect on performance, but dropping the ORDER BY clause from the query also dramatically improved the query times.

One thing that seems strange is that the query is ordering by not only modified_at, but also UUID which doesn't really seem to add any value.

We should see if we can adjust the indexes and/or query in a way which the query planner can optimize correctly.


Subtasks 1 (0 open1 closed)

Task #15966: Review 15902-slow-queryResolvedLucas Di Pentima01/28/2020

Actions

Related issues 1 (0 open1 closed)

Related to Arvados Workbench 2 - Bug #15672: List of subprocesses limited in process viewResolvedLucas Di Pentima12/16/2019

Actions
Actions #1

Updated by Tom Morris about 5 years ago

  • Assigned To set to Ward Vandewege
Actions #2

Updated by Peter Amstutz almost 5 years ago

  • Target version changed from 2020-01-02 Sprint to 2020-01-15 Sprint
Actions #3

Updated by Peter Amstutz almost 5 years ago

  • Assigned To deleted (Ward Vandewege)
Actions #4

Updated by Peter Amstutz almost 5 years ago

  • Assigned To set to Peter Amstutz
Actions #5

Updated by Peter Amstutz almost 5 years ago

  • Category set to API
Actions #6

Updated by Tom Clegg almost 5 years ago

  • Release set to 22
Actions #7

Updated by Tom Clegg almost 5 years ago

  • Assigned To changed from Peter Amstutz to Tom Clegg
Actions #8

Updated by Tom Clegg almost 5 years ago

  • Status changed from New to In Progress

Deleting the (modified_at desc, uuid) index doesn't seem appealing -- it's likely to make different queries slow.

It may be possible to tune postgresql's statistics / query planner so it chooses the appropriate index.

I suspect the easiest short term solution/workaround is for the client to specify order=uuid or order=(requesting_container_uuid, uuid), rather than using the default order=(modified_at desc, uuid).

I'm guessing the client is workbench or workbench2?

Actions #9

Updated by Peter Amstutz almost 5 years ago

  • Target version changed from 2020-01-15 Sprint to 2020-01-29 Sprint
Actions #10

Updated by Tom Clegg almost 5 years ago

  • Related to Bug #15672: List of subprocesses limited in process view added
Actions #11

Updated by Lucas Di Pentima almost 5 years ago

I have checked wb2 code and I haven't found code making a request with a 'requesting_container_uuid in [some list]' filter. In fact, current wb2's master branch would fail to show a big workflow because it makes a request with a filter too big to be used on a GET query.

Actions #13

Updated by Lucas Di Pentima almost 5 years ago

This LGTM, waiting for tests to finish before merging.

Actions #14

Updated by Lucas Di Pentima almost 5 years ago

Re-running wb integration tests just in case

developer-run-tests-apps-workbench-integration: #1811

Actions #15

Updated by Anonymous almost 5 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF