Bug #6602
closed[Workbench] Pipeline components tab preloads all tasks; times out for jobs with many tasks
100%
Description
The bug¶
app/views/pipeline_instances/_show_components_running.html.erb
includes the line tasks = JobTask.filter([['job_uuid', 'in', job_uuids]]).results
. For jobs that create many tasks, this will take a while to execute. Because this renders automatically as part of showing a pipeline instance, In the worst case, it can take so long that a browser or front-end proxy gives up waiting for Workbench to render a pipeline instance page.
It's important that having many tasks not prevent the page from loading. There are lots of possible solutions; the engineering team can specify one together.
The fix¶
We fetch these tasks to display "node-slot time," as described by Tom in comments below. This is not a very useful metric for users. Stop displaying it, and instead display "node allocation time." This matches less closely to compute resources used, but more closely to the real costs of running the job (since Crunch reserves entire nodes, and most environments bill on a node-hour basis).
Given a job record job_rec
, the formula to compute node reservation time in seconds is approximately:
(job_rec[:runtime_constraints]["min_nodes"] || 1) * (job_rec[:finished_at] - job_rec[:started_at])
(The right parentheses needs to provide the number of seconds the job ran. You may need to do a little transformation on the finished_at and started_at, or the result of the subtraction, to get that.)
Update the Workbench view to make sure this time is accurately described as the amount of time that nodes were allocated to run the job.
To be very clear, a functional requirement of this story is that Workbench must not fetch any job tasks to render the pipeline components tab.
Updated by Brett Smith over 9 years ago
- Target version changed from 2015-08-19 sprint to 2015-08-05 sprint
Updated by Tom Clegg over 9 years ago
Calculating "CPU time" (actually node-slot time) seems to be the sole purpose of fetching all tasks.
IMO we should consider whether "node-slot time" is truly an interesting number before we dedicate ourselves to computing it faster/asynchronously.
Other numbers which may be more interesting and require different approaches:- Node reservation time (easy to calculate without looking at tasks -- just start, finish, and the first few lines of the log)
- CPU time (can only be calculated by reading crunchstat logs)
If we decide one or more of these is important, we should consider precomputing them on the API side and adding them to the Job records. This would be really fast on the Workbench side, and would also facilitate aggregate reporting for accounting purposes.
Updated by Tom Clegg over 9 years ago
(Summary of engineering discussion + some more detail)
Current metrics¶
The current "scaling factor" metric tells us how much task concurrency is being added by Arvados, regardless of its effect on CPU usage or total run time. For example, with 16-core compute nodes, and assumingbwa -tN
keeps N cores busy:
bwa -t1
on 1 node with max_tasks_per_node=1 → scaling factor 1 (expect 100% core usage)bwa -t16
on 1 node with max_tasks_per_node=1 → scaling factor 1 (expect 1600% core usage)bwa -t16
on 2 nodes with max_tasks_per_node=1 → scaling factor 21 (expect 3200% core usage)bwa -t16
on 1 node with max_tasks_per_node=16 → scaling factor 161 (expect 1600% core usage)bwa -t16
on 2 nodes with max_tasks_per_node=16 → scaling factor 321 (expect 3200% core usage)
1 Less in practice: tasks won't all finish at the same time, so some slots will be empty part of the time.
The metric labeled "CPU time" actually indicates how much "slot time" was allocated to tasks. Generally it's hard to conclude much from this because slot A being idle typically makes slot B run faster.
The term "scaling factor" is vague (until we document it). The term "CPU time" is just wrong (like a stopped clock -- if your job happens to average 100% CPU across all tasks, then yes, the figure given as CPU time will be correct).
Other (possibly) desirable metrics¶
Node time is useful as a proxy for "compute cost". Regardless of what your CPUs/slots were doing, your job was monopolizing those nodes because that's how Crunch1 allocates resources.
Actual CPU usage is useful for finding optimization opportunities: "I had 4x 16-core nodes but I only used 1200% CPU core time, not 6400% -- my job is not CPU-bound."
Easiest solution¶
It's easy to compute "amount of slot time allocated", which is similar to (but larger than) the number we compute expensively right now:
nodes × max_tasks_per_node × (job_end_time - job_start_time)
This is larger than the current figure because it doesn't take idle slots into account. We could call this "maximum scaling factor"?
Documentation?¶
IMO we should document how to use the (current/proposed) metrics. For example:- If your actual scaling factor is much less than your maximum scaling factor (i.e., you have idle slots) you might have wildly different task durations, or not enough tasks to fill all of the slots. Try reducing max_slots_per_node and see if your job finishes faster.
- If your actual CPU usage is much lower than nodes×cores_per_node, try increasing max_slots_per_node and (if the total number of tasks is much smaller than max_slots_per_node×nodes) split the work into a larger number of tasks. This is common when your tasks don't make good use of multiple CPU cores by themselves.
Crunch2¶
This problem/implementation will look different in Crunch2.- A job won't necessarily tie up whole numbers of nodes. It will get some RAM, some disk, some CPU cores -- and only on one node.
- Pipeline_instances→jobs→job_tasks will be jobs→jobs→jobs.
Updated by Brett Smith over 9 years ago
- Story points set to 0.5
Will go with the simple solution + simple documentation for now. There's definitely room for improvement here but it's less of a priority.
Updated by Radhika Chippada over 9 years ago
- Assigned To set to Radhika Chippada
Updated by Brett Smith over 9 years ago
(09:15:38 AM) radhika: to do the simple solution, nodes × max_tasks_per_node × (job_end_time - job_start_time), how can workbench get this info about max_nodes and max_tasks_per_node?
(09:23:27 AM) Me: radhika: That information in each job's runtime_constraints hash property. See the bottom of http://doc.arvados.org/api/schema/Job.html
(09:24:26 AM) Me: If max_nodes isn't defined, fall back on min_nodes; or if that's not defined, 1.
(09:24:48 AM) Me: max_tasks_per_node is a little trickier, because the default is "the number of cores on the compute node," and I'm not sure that information is so easy to get.
(09:25:02 AM) Me: Theoretically it could even be different for different compute nodes running the same job.
Updated by Radhika Chippada over 9 years ago
- Status changed from New to In Progress
Updated by Radhika Chippada over 9 years ago
From IRC:
tom: OK. How about: This pipeline took 10.8 hours to run and used 4.52 days of ...
tom: I think "node allocation time" is more to the point.
Updated by Peter Amstutz over 9 years ago
Instead of using render_runtime_compact
please use render_runtime(duration, false)
, the second parameter of render_runtime()
chooses between writing out full words and a compact form ("1 hour 30 minutes" vs. "1h 30m").
Based on IRC conversation, render_runtime()
should be tweaked to round off to two significant places. So if the runtime is "1 day 4 hours 29 minutes 12 seconds" it should display as just "1d 4h".
Updated by Radhika Chippada over 9 years ago
Regarding:
Based on IRC conversation, render_runtime() should be tweaked to round off to two significant places. So if the runtime is "1 day 4 hours 29 minutes 12 seconds" it should display as just "1d 4h"
This is an incorrect representation when it is, let's say, 1 day 4 hours 59 min. In that case, it would need rounding the hours to 4.98 hours. Since, we seem to be trying to avoid decimals, I would instead go with the display format 1d4h59m (use the existing round_to_min parameter in render_runtime method).
I think this is another item where we ended up bikeshedding too much and wasted too much precious time!!
Updated by Radhika Chippada over 9 years ago
5058d1cbdbde801a2cf7e303d83e1a626015afdd
- Removed the render_runtime_compact method
- Used render_runtime method with use_words = false and round_to_min = true. This also needed a couple small updates to the render_runtime method implementation.
- There was an extra space after d in "1d 24h32m". Removed that extra space character so that it now displays "1d24h32m"
- If days = 0, hours = 0, minutes = 0 and number of seconds < 30 and hence rounded off to zero minutes, the display was "0 seconds" with round_to_min. Updated the method to use similar logic as "use_words" where it does not round off seconds when all other values are zeroes.
Thanks.
Updated by Peter Amstutz over 9 years ago
Thank you, I appreciate the change.
The rest of it looks good to me.
Updated by Radhika Chippada over 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:7d5d40c55d2a38b12e810f3b9d3e168ee434cbd2.