Actions
Bug #8497
closed[Data Manager] Small batch size makes it slow to process collections
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
02/23/2016
Due date:
% Done:
100%
Estimated time:
(Total: 0.00 h)
Story points:
0.5
Description
datamanager queries the API server for collections in batches of 50. This is very slow.
Performance data on our system (running with the fix from 8485 as otherwise we can't fetch all the collections):
$ time ./datamanager -dry-run &> /tmp/datamanager-dry-run-50.log real 72m30.514s user 6m30.243s sys 0m43.171s
Changing one line in datamanager.go from 'BatchSize: 50' to 'BatchSize: 1000' results in:
$ time ./datamanager -dry-run &> /tmp/datamanager-dry-run-1000.log real 12m57.729s user 5m16.569s sys 0m28.488s
I'd suggest raising the BatchSize as much as possible (or making it a configuration parameter).
Updated by Joshua Randall almost 9 years ago
- Status changed from New to Feedback
- Assigned To set to Joshua Randall
- % Done changed from 0 to 100
Updated by Joshua Randall almost 9 years ago
- Assigned To deleted (
Joshua Randall)
Updated by Brett Smith almost 9 years ago
- Subject changed from datamanager is slow to process collections to [Data Manager] Small batch size makes it slow to process collections
Updated by Brett Smith almost 9 years ago
- Target version set to 2016-03-16 sprint
Updated by Radhika Chippada almost 9 years ago
PR #41
Rather than hardcoding the batch size of 1000, please add an argument "collection-batch-size" with default value of 1000.
Updated by Joshua Randall almost 9 years ago
Radhika, I've now made the batch size a command line option ( -collection-batch-size) as requested.
Updated by Radhika Chippada almost 9 years ago
- Status changed from Feedback to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:6303d577e0513eae1254a9c73648c24b9451ed10.
Actions