Actions
Bug #3384
closed[Crunch] Termination of jobs due to 'Connection timed out'?
Start date:
07/28/2014
Due date:
% Done:
0%
Estimated time:
Story points:
-
Description
Pipeline instance qr1hi-8i9sb-n1yv047kymyjtxs failed when it was working before. Looking at the output log collection 2482f18b2f601d248bb4fe93e296b862+87, there is a line that says:
2014-07-28_14:02:15 qr1hi-8i9sb-n1yv047kymyjtxs 10767 50 stderr socket.error: [Errno 110] Connection timed out
2014-07-28_14:02:15 qr1hi-8i9sb-n1yv047kymyjtxs 10767 50 stderr srun: error: compute0: task 0: Exited with exit code 1
followed by subsquent job cancellations:
2014-07-28_14:02:16 qr1hi-8i9sb-n1yv047kymyjtxs 10767 54 stderr srun: sending Ctrl-C to job 3133.57
2014-07-28_14:02:16 qr1hi-8i9sb-n1yv047kymyjtxs 10767 54 stderr crunchstat: caught signal:interrupt
Updated by Tom Clegg over 10 years ago
Possible solution (or at least helpful improvement):
[Crunch] API communication fail should result in recording temporary task failure, not permanent.
Updated by Tim Pierce over 10 years ago
- Target version changed from Bug Triage to 2014-10-08 sprint
Updated by Tim Pierce over 10 years ago
- Subject changed from Termination of jobs due to 'Connection timed out'? to [Crunch] Termination of jobs due to 'Connection timed out'?
- Category set to Crunch
- Project changed from 35 to Arvados
- Assigned To set to Tim Pierce
Updated by Tim Pierce over 10 years ago
- Status changed from New to Closed
Could not reproduce; re-running this pipeline at https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-ftse9e4sz35fot7 yielded success.
Actions