Bug #6602
closed[Workbench] Pipeline components tab preloads all tasks; times out for jobs with many tasks
100%
Description
The bug¶
app/views/pipeline_instances/_show_components_running.html.erb
includes the line tasks = JobTask.filter([['job_uuid', 'in', job_uuids]]).results
. For jobs that create many tasks, this will take a while to execute. Because this renders automatically as part of showing a pipeline instance, In the worst case, it can take so long that a browser or front-end proxy gives up waiting for Workbench to render a pipeline instance page.
It's important that having many tasks not prevent the page from loading. There are lots of possible solutions; the engineering team can specify one together.
The fix¶
We fetch these tasks to display "node-slot time," as described by Tom in comments below. This is not a very useful metric for users. Stop displaying it, and instead display "node allocation time." This matches less closely to compute resources used, but more closely to the real costs of running the job (since Crunch reserves entire nodes, and most environments bill on a node-hour basis).
Given a job record job_rec
, the formula to compute node reservation time in seconds is approximately:
(job_rec[:runtime_constraints]["min_nodes"] || 1) * (job_rec[:finished_at] - job_rec[:started_at])
(The right parentheses needs to provide the number of seconds the job ran. You may need to do a little transformation on the finished_at and started_at, or the result of the subtraction, to get that.)
Update the Workbench view to make sure this time is accurately described as the amount of time that nodes were allocated to run the job.
To be very clear, a functional requirement of this story is that Workbench must not fetch any job tasks to render the pipeline components tab.