Feature #4456: [Workbench] Provide more feedback about when a queued job is likely to start. - Arvados

Actions

Copy link

Feature #4456

open

[Workbench] Provide more feedback about when a queued job is likely to start.

Added by Brett Smith over 10 years ago. Updated almost 4 years ago.

Status:

New

Priority:

Normal

Assigned To:

Tom Clegg

Category:

Workbench

Target version:

Start date:

11/06/2014

Due date:

% Done:

Estimated time:

(Total: 0.00 h)

Story points:

1.0

Description

From discussion in IRC, users often feel like the time their jobs start is effectively random. It would help to provide more visibility about what the cluster is doing.

Currently, if you look at the page for a queued job, it says, "There are N jobs in the queue ahead of this one." This is a good start, but it's still not sufficient information for users to understand what's happening. It would also help them to know whether or not there are nodes free right now, and whether or not a node will boot to run their job (see #4446).

We should display information about this on the Dashboard. For example, "Your next job in the queue is <link>. There are N jobs in the queue ahead of it. <text about node freeness>"

Subtasks 1 (1 open — 0 closed)

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Brett Smith over 10 years ago

Subject changed from [Workbench] Provide more visibility about state of jobs to [Workbench] Provide more visibility about state of queued jobs

Actions

Copy link

Updated by Nancy Ouyang over 10 years ago

Well, just pulling out to the Workbench homepage "there are N jobs ahead" would be a great start, I didn't think to look inside the pipeline. Or job.

My specific issue is I'm trying to learn Arvados by running short pipelines, and I just want to know if I should expect my pipeline to take longer than a minute to start running, in which case I'll go do something else. After hitting run I'll go to the homepage and stare expectantly at the pipeline (the page does a good job of conveying that it is actively tracking the job and I don't need to hit refresh)

So my question is actually "Why is my pipeline not running" and I try to estimate it by "How long did the previous ones take to start" and I can't because it seems random to me.

Other possible solutions: A help topic, "Why isn't my job running" that explains the possible reasons.

At the core, being able to play around and get results quickly will make it more pleasant to learn Arvados. Sandboxes? Arv run? Arvados-like local docker? Immediately running pipelines? are all addressing this.

Actions

Copy link

Updated by Ward Vandewege over 10 years ago

Target version changed from Bug Triage to Arvados Future Sprints

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Story points set to 1.0

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Tracker changed from Bug to Feature
Subject changed from [Workbench] Provide more visibility about state of queued jobs to [Workbench] Provide more feedback about when a queued job is likely to start.

Actions

Copy link

Updated by Tom Clegg about 10 years ago

Target version changed from Arvados Future Sprints to 2015-05-20 sprint

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Assigned To set to Tom Clegg

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Some system states that could be translated to a start-time prediction:

All worker nodes are busy. Nodemanager will probably do something about this in 10(?) seconds.
Some worker nodes are bootstrapping or idle, but they'll be consumed by jobs ahead of yours. Nodemanager will probably do something about this in 10(?) seconds.
After accounting for jobs ahead of yours in the queue, #nodes needed by your job are bootstrapping. Of the workers needed, the most recently started is X seconds old; bootstrapping is usually done in Y seconds according to workbench config (or according to API? a configured or computed ETA could be offered in the nodes#index API response).
There are enough idle nodes to run your job now. Your job will probably start in 10(?) seconds.

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Possible ways/places to present this information:

Pipeline instance #show → Components
- If component state is Queued, show ETA / summary of system state as it relates to this job ("worker nodes spinning up, ETA 2m")
Dashboard → Active pipelines
- (?)Show nearest ETA of any queued job (if any)
Dashboard → Compute and job status
- (?) Show table of nodes (name, state, cores, ram, scratch)
- Replace "submitted" column with ETA

If nodemanager isn't running:

If there are enough idle nodes to run all jobs in the queue up to & including this one, ETA is 10(?) seconds.
Hard to say much otherwise. Number of jobs ahead of yours?
(How does Workbench know nodemanager isn't running?)

Actions

Copy link

#10