Bug #4494
closed
[Crunch] Do more error-checking and show more diagnostic info when installing a Docker image
Added by Tim Pierce about 10 years ago.
Updated about 2 years ago.
Description
Job qr1hi-8i9sb-vx84guvzp3xvwgz failed with this diagnostic output:
11/10/2014 5:05:18 PM crunch Version 83a9390a05bbffc2e4ea95dd693af3ab3547fa12 is commit 83a9390a05bbffc2e4ea95dd693af3ab3547fa12
11/10/2014 5:05:18 PM crunch Run install script on all workers
11/10/2014 5:05:18 PM crunch Install script exited 1
11/10/2014 5:05:18 PM crunch Installing Docker image from 0b1b526683d86c41696eea9353ab5807+4242 exited 1 at /usr/local/arvados/src/sdk/cli/bin/crunch-job line 603
- Description updated (diff)
- Category set to Crunch
more examples: qr1hi-8i9sb-kryvvban6b7hj74 qr1hi-8i9sb-wdv358fgjhh8fsa
This is starting to be a small annoyance of arvados :( its not a super big deal because I can just re-run the job, but i feel that new users would get discouraged by this bug.
- Subject changed from [Crunch] job fails to install Docker image to [Crunch] Do more error-checking and show more diagnostic info when installing a Docker image
- Story points set to 1.0
- Target version changed from Bug Triage to Arvados Future Sprints
This is starting to be bothersome, I've been trying to rerun this same job and I keep getting the error:
2ecb1a2a2b3574fb5c7fc0b1262c6c8c+83/qr1hi-8i9sb-lhf4q05ykxx6yzi.log.txt
2c7ad84b2506214b58f19e98f48d0731+83/qr1hi-8i9sb-fvcnteaq0z7xngb.log.txt
7163ba9c982dedb701a8b8c269bcd7ff+83/qr1hi-8i9sb-uqh0plj8644g9mp.log.txt
- Target version changed from Arvados Future Sprints to 2014-11-19 sprint
These jobs are failing for a variety of reasons, some of which look directly like compute node failures and some maybe not:
2014-11-17_18:25:22.70437 qr1hi-8i9sb-5cb9i2keadpevcq ! srun: error: Unable to resolve "compute28": Unknown host
2014-11-17_18:25:22.77063 qr1hi-8i9sb-eipb3kuyac7absp ! srun: error: Unable to create job step: Memory required by task is not available
2014-11-17_16:15:55.51721 qr1hi-8i9sb-wl1oh0bbw072acd ! srun: error: Task launch for 8473.0 failed on node compute9: User not found on host
2014-11-17_16:25:56.09087 qr1hi-8i9sb-vfrtoez7tgmybv0 ! srun: error: Task launch for 8476.0 failed on node compute9: User not found on host
2014-11-12_21:00:42.45803 qr1hi-8i9sb-kryvvban6b7hj74 ! srun: error: Task launch for 8344.0 failed on node compute3: User not found on host
2014-11-12_21:00:42.84486 qr1hi-8i9sb-wdv358fgjhh8fsa ! srun: error: Task launch for 8348.0 failed on node compute55: User not found on host
- Target version changed from 2014-11-19 sprint to Arvados Future Sprints
- Target version deleted (
Arvados Future Sprints)
- Status changed from New to Closed
Also available in: Atom
PDF