Project

General

Profile

Actions

Bug #8289

closed

[API] Avoid making queries unnecessarily slow by adding superfluous "order by" columns

Added by Tom Clegg almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
02/08/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

As an example, when the Python SDK's websocket fallback PollClient starts up, it does an API request with order="id asc". API server translates that to a PostgreSQL query like this:

arvados_production=# SELECT  "logs".* FROM "logs"  WHERE ((logs.event_type in ('create','update','delete')) AND (logs.id > '3464274')) ORDER BY logs.id desc, logs.modified_at desc, logs.uuid LIMIT 1 OFFSET 0;
arvados_production=# explain analyze SELECT  "logs".* FROM "logs"  WHERE ((logs.event_type in ('create','update','delete')) AND (logs.id > '3464274')) ORDER BY logs.id desc, logs.modified_at desc, logs.uuid LIMIT 1 OFFSET 0;
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=822081.89..822081.89 rows=1 width=1290) (actual time=4007.367..4007.368 rows=1 loops=1)
   ->  Sort  (cost=822081.89..823218.65 rows=454703 width=1290) (actual time=4007.364..4007.364 rows=1 loops=1)
         Sort Key: id, modified_at, uuid
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Bitmap Heap Scan on logs  (cost=69925.24..819808.37 rows=454703 width=1290) (actual time=531.924..3086.689 rows=869534 loops=1)
               Recheck Cond: ((id > 3464274) AND ((event_type)::text = ANY ('{create,update,delete}'::text[])))
               ->  BitmapAnd  (cost=69925.24..69925.24 rows=454703 width=0) (actual time=530.935..530.935 rows=0 loops=1)
                     ->  Bitmap Index Scan on logs_pkey  (cost=0.00..29223.51 rows=1433428 width=0) (actual time=249.940..249.940 rows=1954308 loops=1)
                           Index Cond: (id > 3464274)
                     ->  Bitmap Index Scan on index_logs_on_event_type  (cost=0.00..40474.13 rows=1341921 width=0) (actual time=270.632..270.632 rows=1735755 loops=1)
                           Index Cond: ((event_type)::text = ANY ('{create,update,delete}'::text[]))
 Total runtime: 4007.417 ms

"id" is a primary key, so the "logs.modified_at desc, logs.uuid" part of the "order by" clause can't possibly make a difference to the results. If we remove them, the query looks like this:

arvados_production=# explain analyze SELECT  "logs".* FROM "logs"  WHERE ((logs.event_type in ('create','update','delete')) AND (logs.id > '3464274')) ORDER BY logs.id desc LIMIT 1 OFFSET 0;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..8.67 rows=1 width=1290) (actual time=0.049..0.050 rows=1 loops=1)
   ->  Index Scan Backward using logs_pkey on logs  (cost=0.00..3943415.80 rows=454703 width=1290) (actual time=0.046..0.046 rows=1 loops=1)
         Index Cond: (id > 3464274)
         Filter: ((event_type)::text = ANY ('{create,update,delete}'::text[]))
 Total runtime: 0.078 ms

Subtasks 1 (0 open1 closed)

Task #8354: review 8289-no-extra-ordersResolvedTom Clegg02/08/2016

Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #9542: [API] review logs queries ClosedTom Clegg07/05/2016

Actions
Actions #1

Updated by Tom Clegg almost 9 years ago

  • Description updated (diff)
  • Category set to API
Actions #2

Updated by Tom Clegg almost 9 years ago

8289-no-extra-orders @ 1502c93

Actions #3

Updated by Tom Clegg almost 9 years ago

  • Assigned To set to Tom Clegg
Actions #4

Updated by Brett Smith almost 9 years ago

  • Target version set to Arvados Future Sprints
Actions #5

Updated by Brett Smith almost 9 years ago

  • Target version changed from Arvados Future Sprints to 2016-02-17 Sprint
Actions #6

Updated by Radhika Chippada almost 9 years ago

Review comments:

I experimented with a few more combinations of order in the query_test and I am not sure if we need more tweaking here.

  • If I gave order: ['id asc', 'modified_at asc’], I am ending up with 'logs.id asc, logs.modified_at asc’. Do we want to get rid off the modified_at in this case to ensure clients do not send meaningless yet costly queries?
  • If I gave order: ['modified_at asc'], I am getting ‘logs.modified_at asc, logs.modified_at desc, logs.uuid’
  • If I used order: ['event_type asc', 'modified_at asc’], I am getting "logs.event_type asc", "logs.modified_at asc", "logs.modified_at desc", "logs.uuid”. In this and the previous case, shouldn’t we get rid off the modified_at before adding the full default list? Or better yet, add only those from the default list that are not specified?
Actions #7

Updated by Tom Clegg almost 9 years ago

Good suggestions, thanks. Updated code and added tests; now at 6f1d385

Actions #8

Updated by Radhika Chippada almost 9 years ago

This is much better. Can we update the comment a bit more. Something like: if id and updated_at are given, updated_at is truncated because after the unique id column ordering, no need to order on updated_at. On the other hand, if log_type and id are given ordering on type and then id is what is happening. And then if I used log_type, id, updated_at, we will use only log_type and id and truncate updated_at since there is no point in using it.

With or without it LGTM. Thanks.

Actions #9

Updated by Tom Clegg almost 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:a13c14cfc7fabe4f4da48edd57a086d2d8953a03.

Actions

Also available in: Atom PDF