Actions
Bug #5515
closedJob failure due to 'arv-put' 'ConnectionError'?
Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
03/19/2015
Due date:
% Done:
0%
Estimated time:
Story points:
-
Description
pipeline instance tb05z-d1hrv-ojcxrzlohzyir4r fails from what looks like a 'ConnectionError' from 'arv-put':
2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr Traceback (most recent call last): 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/bin/arv-put", line 4, in <module> 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr main() 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/commands/put.py", line 470, i n main 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr path, max_manifest_depth=args.max_manifest_depth) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/commands/put.py", line 329, i n write_directory_tree 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr path, stream_name, max_manifest_depth) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/collection.py", line 216, in write_directory_tree 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr self.do_queued_work() 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/collection.py", line 144, in do_queued_work 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr self._work_file() 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/collection.py", line 157, in _work_file 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr self.write(buf) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/collection.py", line 471, in write 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr return super(ResumableCollectionWriter, self).write(data) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/collection.py", line 227, in write 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr self.flush_data() 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/commands/put.py", line 305, i n flush_data 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr super(ArvPutCollectionWriter, self).flush_data() 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/collection.py", line 264, in flush_data 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr copies=self.replication)) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/retry.py", line 157, in num_r etries_setter 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr return orig_func(self, *args, **kwargs) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job-work/.arvados.venv/local/lib/python2.7/site-packages/arvados/keep.py", line 808, in put 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr data_hash, copies, thread_limiter.done()), service_errors, label="service") 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr arvados.errors.KeepWriteError: failed to write d9fcbec13e21983498de7e8a489d89c1 (wanted 2 copies but wrote 1): ser vice http://[keep1.tb05z.arvadosapi.com]:25107/ raised ConnectionError (('Connection aborted.', timeout('timed out',))) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr Traceback (most recent call last): 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job/src/crunch_scripts/arv-dax", line 154, in <module> 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr outcollection = upload( outdir ) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/tmp/crunch-job/src/crunch_scripts/arv-dax", line 27, in upload 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr pdh = sp.check_output( ["arv-put", "--no-progress", "--portable-data-hash", source_dir ] ) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr File "/usr/lib/python2.7/subprocess.py", line 544, in check_output 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr raise CalledProcessError(retcode, cmd, output=output) 2015-03-19_14:57:15 tb05z-8i9sb-vz3vjgv5l05c7w9 31965 25 stderr subprocess.CalledProcessError: Command '['arv-put', '--no-progress', '--portable-data-hash', '/tmp/crunch-job-task -work/compute0.13/output']' returned non-zero exit status 1
Updated by Tom Clegg almost 10 years ago
This would have been interpreted as a temporary failure by crunch-job if:
- arv-put caught KeepWriteError and exited 111 (see #5468), and
- arv-dax caught CalledProcessError and exited with the same exit code as the called process, and
- the shell script calling arv-dax exited
$?
after an error
- crunch-job looked for some magic strings like "arvados.errors.Keep..." and treated them like the existing magic "srun: ..." error strings, counting a subsequent failure as transient. (#5524)
Updated by Peter Amstutz almost 10 years ago
- Target version changed from Bug Triage to Arvados Future Sprints
Updated by Tom Morris almost 6 years ago
- Status changed from Feedback to Closed
- Target version deleted (
Arvados Future Sprints)
Closing as obsolete.
Actions