Feature #4579
open[Documentation] Run-command docs should remind user how & why to exit non-zero on failure.
0%
Description
Some jobs encounter errors that seem like they should be fatal errors, but still report job success.
For example, qr1hi-8i9sb-mtxaffgfw6athnp:
grep: //c09a19ea17f72c8da97f8cb64a9b333b+743: No such file or directory
or qr1hi-8i9sb-gn0jmhwp88j3a8z:
ls: cannot access /keep//keep/c09a19ea17f72c8da97f8cb64a9b333b+743/*.vcf: No such file or directory
These jobs should report failure.
Updated by Tim Pierce about 10 years ago
- Subject changed from Crunch is able to detect unique errors within scripts? to [Crunch] failed jobs are incorrectly reported as succeeding
- Description updated (diff)
- Category set to Crunch
Updated by Tom Clegg about 10 years ago
- Tracker changed from Feature to Bug
If you use run-command, the only way to indicate success/failure is exit status. In both of these cases it looks like the script exits 0, run-command sets success=true
on the task, and Crunch sets state=Complete
. Crunch's part of this looks correct.
The script itself, however, incorrectly exit 0 after encountering errors. Fixing this could be as simple (or not simple) as using "set -e" and "set -o pipefail" in all the right places.
Aside 1: The run-command documentation could certainly be more forthcoming with advice about how to write scripts for it to use. (Currently exit codes are only mentioned in the context of the "ignore exit code" feature, which incidentally should probably be adjusted to explain what a terrible, terrible idea it is to use that feature.)
Aside 2: When you're at the point of giving run-command a shell script which in turn builds and runs another shell script, you're doing it wrong. At some point our docs failed you, by steering you toward using run-command for this instead of writing a Python program that calls one_task_per_input_file
...
Updated by Tom Clegg about 10 years ago
- Subject changed from [Crunch] failed jobs are incorrectly reported as succeeding to [Documentation] Run-command docs should remind user how & why to exit non-zero on failure.
- Category changed from Crunch to Documentation
Updated by Tom Clegg about 10 years ago
- Tracker changed from Bug to Feature
- Status changed from Feedback to New
- Story points set to 0.5
Updated by Tom Clegg about 10 years ago
- Target version changed from Bug Triage to Arvados Future Sprints
Updated by Ward Vandewege over 3 years ago
- Target version deleted (
Arvados Future Sprints)