Bug #5499
closed[SDKs] arv-copy fails to create collection after copying many blocks
Description
arv-copy --src qr1hi --dst su92l --dst-git-repo bcosc --project-uuid su92l-j7d0g-s10d0ru68kdzgve qr1hi-p5p6p-hu2ep9hu6sxzkq9
2015-03-18 15:40:31 arvados.keep8467 DEBUG: Request: PUT https://[keep.su92l.arvadosapi.com]:443/5d2eebcadb35bfeda0019137f6c249f4
2015-03-18 15:40:42 arvados.keep8467 DEBUG: KeepWriterThread <KeepWriterThread(Thread-355, started 140497865701120)> succeeded 5d2eebcadb35bfeda0019137f6c249f4+67108864 https://[keep.su92l.arvadosapi.com]:443/
59b5517e7b00f858e8952512a25a9fd4+20681: 22565M / 22567M 100.0% 2015-03-18 15:40:42 arvados.keep8467 DEBUG: 937921d6f8d6fb3a9a165c74800e42fc: ['http://[keep0.qr1hi.arvadosapi.com]:25107/', 'http://[keep3.qr1hi.arvadosapi.com]:25107/', 'http://[keep1.qr1hi.arvadosapi.com]:25107/', 'http://[keep2.qr1hi.arvadosapi.com]:25107/']
2015-03-18 15:40:42 arvados.keep8467 DEBUG: Request: GET http://[keep0.qr1hi.arvadosapi.com]:25107/937921d6f8d6fb3a9a165c74800e42fc+2208052+A5584a64d231318ca47b9ec2d9192800fdd970eb3@551c026d
2015-03-18 15:40:42 arvados.keep8467 INFO: 200 response: 2208052 bytes in 99.4808673859 msec (21.168 MiB/sec)
2015-03-18 15:40:42 arvados.keep8467 DEBUG: 937921d6f8d6fb3a9a165c74800e42fc: ['https://[keep.su92l.arvadosapi.com]:443/']
2015-03-18 15:40:42 arvados.keep8467 DEBUG: KeepWriterThread <KeepWriterThread(Thread-356, started 140497865701120)> proceeding 937921d6f8d6fb3a9a165c74800e42fc+2208052 https://[keep.su92l.arvadosapi.com]:443/
2015-03-18 15:40:42 arvados.keep8467 DEBUG: Request: PUT https://[keep.su92l.arvadosapi.com]:443/937921d6f8d6fb3a9a165c74800e42fc
2015-03-18 15:40:43 arvados.keep8467 DEBUG: KeepWriterThread <KeepWriterThread(Thread-356, started 140497865701120)> succeeded 937921d6f8d6fb3a9a165c74800e42fc+2208052 https://[keep.su92l.arvadosapi.com]:443/
59b5517e7b00f858e8952512a25a9fd4+20681: 22567M / 22567M 100.0%
2015-03-18 15:40:43 arvados.keep8467 DEBUG: 9639e538f0279dcb74bee1a79fc60706: ['https://[keep.su92l.arvadosapi.com]:443/']
2015-03-18 15:40:43 arvados.keep8467 DEBUG: KeepWriterThread <KeepWriterThread(Thread-357, started 140497865701120)> proceeding 9639e538f0279dcb74bee1a79fc60706+41336 https://[keep.su92l.arvadosapi.com]:443/
2015-03-18 15:40:43 arvados.keep8467 DEBUG: Request: PUT https://[keep.su92l.arvadosapi.com]:443/9639e538f0279dcb74bee1a79fc60706
2015-03-18 15:40:43 arvados.keep8467 DEBUG: KeepWriterThread <KeepWriterThread(Thread-357, started 140497865701120)> succeeded 9639e538f0279dcb74bee1a79fc60706+41336 https://[keep.su92l.arvadosapi.com]:443/
^C^CTraceback (most recent call last):
File "/usr/local/bin/arv-copy", line 4, in <module>
main()
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 123, in main
src_arv, dst_arv, args)
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 264, in copy_pipeline_template
pt = copy_collections(pt, src, dst, args)
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in copy_collections
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 315, in <dictcomp>
return {v: copy_collections(obj[v], src, dst, args) for v in obj}
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 311, in copy_collections
obj = arvados.util.portable_data_hash_pattern.sub(copy_collection_fn, obj)
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 301, in copy_collection_fn
dst_col = copy_collection(src_id, src, dst, args)
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 567, in copy_collection
return create_collection_from(c, src, dst, args)
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 400, in create_collection_from
dst_collection = dst.collections().create(body=c, ensure_unique_name=True).execute(num_retries=args.retries)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 137, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 722, in execute
body=self.body, headers=self.headers)
File "/usr/local/lib/python2.7/dist-packages/arvados/api.py", line 33, in intercept_http_request
return self.orig_http_request(uri, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/httplib2/_init__.py", line 1593, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/_init__.py", line 1258, in _conn_request
conn.request(method, request_uri, body, headers)
File "/usr/lib/python2.7/httplib.py", line 962, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 996, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 958, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 818, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 794, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/ssl.py", line 229, in sendall
v = self.send(data[count:])
File "/usr/lib/python2.7/ssl.py", line 198, in send
v = self._sslobj.write(data)
Updated by Brett Smith almost 11 years ago
- Subject changed from arv-copy issue to [SDKs] arv-copy issue
- Category set to SDKs
- Target version set to Bug Triage
Bryan,
Can you describe the issue in more detail? From the transcript you pasted, it looks like you hit Ctrl-C and the process died. I'm guessing it appeared to be hung? It was apparently trying to send network traffic when you stopped it, so there are a lot of potential explanations there.
Updated by Bryan Cosca almost 11 years ago
I believe I tried this command twice. The first time I waited after the line before I Ctrl+c from about ~2 hours while nothing happened. Then I tried again and waited 45 minutes to see if anything changed. I believe I showed this to Peter when he was on science support and I think he fixed it.
Updated by Brett Smith almost 11 years ago
The create request apparently did not make it to the API server at all. In the logs, the nearest collection-related requests are at 14:19 and 17:17 UTC—or, assuming the timestamps in this ticket are in US Eastern, 17:41 and 20:24 UTC. Probably want to start the investigation client-side.
Updated by Brett Smith almost 11 years ago
Digging a little more, I found where the arv-copy process tried to find the collection on the destination cluster. That happened about an hour before it was ready to write (same timestamps on the client and server).
During that hour, the client was copying individual blocks from the source Keep server to the destination. Current theory: something wedged the API client while it was sitting idle during that time. I'm running some tests to try to reproduce this in isolation.
Updated by Brett Smith almost 11 years ago
- Status changed from New to In Progress
- Assigned To set to Brett Smith
- Target version changed from Bug Triage to 2015-04-29 sprint
Updated by Brett Smith almost 11 years ago
- Subject changed from [SDKs] arv-copy issue to [SDKs] arv-copy fails to create collection after copying many blocks
- Status changed from In Progress to New
- Assigned To deleted (
Brett Smith) - Target version changed from 2015-04-29 sprint to Bug Triage
I was unable to reproduce the problem with this test script:
import logging
import time
import arvados
def stall_and_query(arv, minutes_wait):
logging.info("waiting %d minutes", minutes_wait)
time.sleep(60 * minutes_wait)
logging.info("sending query")
loglist = arv.logs().list(
limit=1, order='created_at desc').execute(num_retries=3)
logging.info("latest log is %s", loglist['items'][0]['id'])
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
arv = arvados.api('v1')
stall_and_query(arv, 0)
for minutes_wait in xrange(30, 121, 10):
stall_and_query(arv, minutes_wait)
On both my workstation and shell.su92l, this kept making requests throughout the time period. The issue is not as simple as "The API client is useless after idling for a long time." (Which is good, otherwise we'd have a problem with Crunch scripts.)
Updated by Brett Smith almost 11 years ago
Is it possible that this was caused by the GCE 10-minute firewall rule, and so duplicates that ticket?
Updated by Brett Smith over 10 years ago
- Status changed from New to Duplicate
- Target version deleted (
Bug Triage)
Ward agrees with the GCP firewall theory, so we're going to figure that was the issue and close this.