Project

General

Profile

Actions

Bug #5845

closed

Pipeline has failed but no jobs are marked as failed

Added by Abram Connelly over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
04/28/2015
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-5hkbkuwvsve9lsk#Components

Pipeline has failed but one job reports "complete" while the other reports "Not ready". No jobs report "failed" even though the whole pipeline has failed.


Files

fail_not_fail.png (110 KB) fail_not_fail.png Abram Connelly, 04/28/2015 10:28 PM

Related issues 1 (1 open0 closed)

Is duplicate of Arvados - Bug #5906: [API] crunch-dispatch should mark a job failed when its repository cannot be fetchedNew05/05/2015

Actions
Actions #1

Updated by Bryan Cosca over 9 years ago

From the pipeline instance logs:

Error creating job for component RefreshReport: Repository not found: '$USER'
Job submission was: {"job":{"script":"run-command","script_parameters":{"command":["$(job.srcdir)/crunch_scripts/get-evidence-refresh-shim","$(file $(GET_EVIDENCE_JSON))","$(file $(GETEV_LATEST))"],"OUT_DATA_DIR":"01a1bf596b269e487d220053f8f29724+249","GET_EVIDENCE_JSON":"$(OUT_DATA_DIR)/out-data/get-evidence.json","GETEV_LATEST":"2511736ccd170e3be28b7d10077ea8e5+74/getev-latest.json.gz"},"script_version":"get-evidence-refresh","repository":"$USER","runtime_constraints":{"docker_image":"arvados/jobs","arvados_sdk_version":"38e27663cf656f0c9c443a2715f249afe39a8bfb","min_nodes":1},"owner_uuid":"su92l-tpzed-6cw59akrlzqb2sl","submit_id":"instance su92l-d1hrv-5hkbkuwvsve9lsk rand d7xy2p313grr","state":"Queued"},"find_or_create":true}

Actions #2

Updated by Abram Connelly over 9 years ago

The bug is not that the job failed but that workbench is not displaying the failure.

The job failed because I mis-specified the repository. It was looking for the literal string '$USER' for the repository and did not find it, causing the job to not even begin because it couldn't find the repository.

Once the failure was noticed Arvados (correctly) cleaned up the job, stopped the pipeline and did not continue. When viewing the pipeline, though, the whole pipeline is marked as 'failed', the first job in the two-job pipeline is marked as successful (correctly) and the second job in the pipeline is marked as 'not ready'.

The second job marked as 'not ready' is, in my opinion, a bug. From my perspective, the second job failed and should be marked as such.

Actions #3

Updated by Brett Smith over 9 years ago

  • Status changed from New to Closed
  • Target version deleted (Bug Triage)

Closing as a duplicate of #5906 (since I already wrote the specification there).

Actions

Also available in: Atom PDF