Bug #5901: [API] Improve performance of large requests in parallel - Arvados

Actions

Copy link

Bug #5901

open

[API] Improve performance of large requests in parallel

Added by Brett Smith almost 10 years ago. Updated almost 4 years ago.

Status:

New

Priority:

Normal

Assigned To:

Category:

API

Target version:

Start date:

05/04/2015

Due date:

% Done:

Estimated time:

Story points:

Description

Attached are two files. The first is a simple Python script that uses threads to fetch the same collection object from the API server multiple times simultaneously. Currently, the collection's manifest is 75492690 bytes. The collection UUID is su92l-4zz18-wd2va9q9lnfx6ga

The log file was generated by running:

for n in 2 4 6 8; do python multi.py "$n" || break; done | tee multi.log

Simply, it shows that performance takes a noticeable dive as the number of simultaneous requests increase. The eight-thread calls never succeed; instead they raised a timeout exception. This problem just bit a real user: parallelizing over many files in this collection, the first batch of parallel tasks all failed because they all tried to fetch the collection simultaneously, and timed out waiting for an API server response. We have to improve performance here to make sure this use pattern doesn't fail.

Files

Download all files

multi.log (1.41 KB) multi.log	Log file showing degraded performance	Brett Smith, 05/04/2015 08:25 PM
multi.py (691 Bytes) multi.py	Test script to demonstrate issue	Brett Smith, 05/04/2015 08:25 PM
multi1.log (1.74 KB) multi1.log		Tom Morris, 02/22/2017 11:40 PM

Related issues 1 (1 open — 0 closed)

Actions

Copy link Download all files

Updated by Brett Smith almost 10 years ago

File multi.log multi.log added
File multi.py multi.py added
Description updated (diff)

Actions

Copy link

Updated by Brett Smith almost 10 years ago

Target version changed from Bug Triage to Arvados Future Sprints

Actions

Copy link

Updated by Tom Morris about 8 years ago

File multi1.log multi1.log added
Description updated (diff)

This has improved by over an order of magnitude(!) since 2015 which is great, but 20 seconds to fetch 75 MB still seems like an awful lot of time and a 3-4x stretch factor under an 8x load when the data should already be cached also seems out of line.

Threads	Elapsed (2015)	Elapsed (2017)
2	275.6	20.8
2	283.0	25.0
4	285.3	28.5
4	287.2	31.7
4	390.5	33.9
4	396.0	53.7
6	654.8	28.6
6	919.5	35.4
6	923.7	37.2
6	931.3	49.5
6	933.5	75.7
6	934.4	86.2
8	-	38.8
8	-	49.3
8	-	56.1
8	-	57.8
8	-	63.3
8	-	67.0
8	-	70.0
8	-	72.8

Actions

Copy link

Updated by Ward Vandewege almost 4 years ago

Target version deleted (~~Arvados Future Sprints~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Watchers (1)

Bug #5901

[API] Improve performance of large requests in parallel

Updated by Brett Smith almost 10 years ago

Updated by Brett Smith almost 10 years ago

Updated by Tom Morris about 8 years ago

Updated by Ward Vandewege almost 4 years ago