Story #5781
open
[API] [DRAFT] Provide API methods for manipulating and combining collections
Added by Radhika Chippada over 9 years ago.
Updated over 3 years ago.
Description
Currently workbench provides “create a new collection by combining selected collections” and “create a new collection by combining selected collection files” functionality. Workbench gets the required collection manifest_text(s), generages the combined manifest_text, invokes save new collection with the combined manifest_text, gets the newly saved collection from server, and finally displays to the user. However, this implementation is not scalable. When very large collections are combined or several collections from a very large collection are combined, the workbench combining collections operations fails. Workbench fails with timeout errors while exchanging these large collection manifest texts with the api server. There are a few bugs reported about this: #4943 and #5614.
A potential solution:
- Provide a “create by combining” api method that performs the steps currently being performed by the workbench. Thus,
- This method will take the selections (list of files selected from a collection, or list of collections selected)
- Generate the manifest_text by combining these selections (using the ruby sdk)
- Save a new collection with the combined manifest text and the owner_uuid provided
- Return the newly created collection to the client
- Hence, this solution requires the exchange of only one manifest text between the api server and workbench (that of the newly created collection), and hence offering much better performance.
Files
- Subject changed from [API] Provide an api method to combine collections to [API] [DRAFT] Provide API methods for manipulating and combining collections
- Story points set to 2.0
- Target version changed from Arvados Future Sprints to 2015-05-20 sprint
- Assigned To set to Radhika Chippada
- Status changed from New to In Progress
On IRC:
tom 4:42 definitely look up the flamegraph gem and try putting &pp=flamegraph at the end of a (dev) workbench url to see what you get
tom 4:43 commit 288d22d8a7ff1f9a441d2b8058382e807873d7d5 message has some notes too
tom 4:44 I think it would be very useful to inspect a single too-slow example request and make a table of how many seconds are spent between various checkpoints.
tom 4:45 if you're running with RAILS_ENV=development, you should be able to use the flamegraph feature
tom 4:47 There are so many things we could optimize... but we should be able to figure out what the maximum possible benefit is in each area
And the commit log mentioned above:
commit 288d22d8a7ff1f9a441d2b8058382e807873d7d5
Author: Tom Clegg <tom@curoverse.com>
Date: Tue Jan 13 10:18:16 2015 -0500
3021: Add web-inspectable profiling mode.
* Run Workbench with environment variable ENABLE_PROFILING=yes. Timing
figures should appear at the top left of each page. Click to get
more detail.
* Visit {workbench-uri}?pp=flamegraph to see a profiling graph instead
of the requested page itself.
* More: https://github.com/MiniProfiler/rack-mini-profiler
- File metrics-show-qr1hi-4zz18-tcnxylwkxg0nfhi.png added
Pointed my workbench in dev to production and accessed:
collections/qr1hi-4zz18-tcnxylwkxg0nfhi?pp=full-backtrace
The attached metrics-show png shows the metrics information to show this collection (with ?pp=full-backtrace appended to url)
- File deleted (
metrics-show-qr1hi-4zz18-tcnxylwkxg0nfhi.png)
- Target version changed from 2015-05-20 sprint to Arvados Future Sprints
- Assigned To deleted (
Radhika Chippada)
- Target version changed from Arvados Future Sprints to 2017-03-29 sprint
- Target version changed from 2017-03-29 sprint to Arvados Future Sprints
- Target version deleted (
Arvados Future Sprints)
Also available in: Atom
PDF