Bug #5426
closed
[Workbench] Large downloads through workbench fail
Added by Peter Amstutz almost 10 years ago.
Updated over 7 years ago.
Estimated time:
(Total: 0.00 h)
Description
Right around 1 GiB, this download fails (notice that it fails at two different positions, but the same position the last 2 times...)
--2015-03-10 09:03:52-- https://workbench.qr1hi.arvadosapi.com/collections/download/qr1hi-4zz18-b1uuzkf11kg3huv/3yfrrbhnsh4t1qyr8catlfa5q8uy2m7wscuvdrm4d485hqgy9u/lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta
Reusing existing connection to workbench.qr1hi.arvadosapi.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta’
lobstr_v3.0.2_hg19_ [ <=> ] 1.01G 2.98MB/s in 6m 17s
Last-modified header missing -- time-stamps turned off.
2015-03-10 09:10:10 (2.74 MB/s) - Read error at byte 1083196276 (The request is invalid.).Retrying.
--2015-03-10 09:10:11-- (try: 2) https://workbench.qr1hi.arvadosapi.com/collections/download/qr1hi-4zz18-b1uuzkf11kg3huv/3yfrrbhnsh4t1qyr8catlfa5q8uy2m7wscuvdrm4d485hqgy9u/lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta
Connecting to workbench.qr1hi.arvadosapi.com (workbench.qr1hi.arvadosapi.com)|54.88.31.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta’
lobstr_v3.0.2_hg19_ [ <=> ] 1.01G 3.05MB/s in 6m 34s
2015-03-10 09:16:47 (2.63 MB/s) - Read error at byte 1084238716 (The request is invalid.).Retrying.
--2015-03-10 09:16:49-- (try: 3) https://workbench.qr1hi.arvadosapi.com/collections/download/qr1hi-4zz18-b1uuzkf11kg3huv/3yfrrbhnsh4t1qyr8catlfa5q8uy2m7wscuvdrm4d485hqgy9u/lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta
Connecting to workbench.qr1hi.arvadosapi.com (workbench.qr1hi.arvadosapi.com)|54.88.31.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta’
lobstr_v3.0.2_hg19_ [ <=> ] 1.01G 1.98MB/s in 7m 59s
2015-03-10 09:24:50 (2.16 MB/s) - Read error at byte 1084238716 (The request is invalid.).Retrying.
- Description updated (diff)
- Description updated (diff)
- Subject changed from [Keep] Large downloads through workbench fail to [Workbench] Large downloads through workbench fail
- Category set to Workbench
Thoughts
- Could this be a proxy issue? (Try bypassing nginx and downloading from Workbench directly, from inside the firewall?)
- Anything in Workbench logs?
- Anything in nginx logs?
- Confirmed there's no problem retrieving the entire file with other tools?
- Category deleted (
Workbench)
The bug appears to be in our code. Workbench does a fork (IO.popen) to call arv-get and streams the files. Nginx says in the logs:
2015/03/10 13:21:07 [error] 5544#0: *395704 upstream prematurely closed connection while reading upstream, client: 74.118.24.162, server: workbench.qr1hi.arvadosapi.com, request: "GET /collections/download/qr1hi-4zz18-b1uuzkf11kg3huv/3yfrrbhnsh4t1qyr8catlfa5q8uy2m7wscuvdrm4d485hqgy9u/lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta HTTP/1.1", upstream: "http://127.0.0.1:9000/collections/download/qr1hi-4zz18-b1uuzkf11kg3huv/3yfrrbhnsh4t1qyr8catlfa5q8uy2m7wscuvdrm4d485hqgy9u/lobstr_v3.0.2_hg19_ref/lobSTR_ref.fasta", host: "workbench.qr1hi.arvadosapi.com", referrer: "https://workbench.qr1hi.arvadosapi.com/collections/download/qr1hi-4zz18-b1uuzkf11kg3huv/3yfrrbhnsh4t1qyr8catlfa5q8uy2m7wscuvdrm4d485hqgy9u/"
There is nothing in the nginx error log for the process running on port 9000.
So, it looks like the IO.popen dies or arv-get dies, without logging anything in the webserver logs. This happens reliably at sizes just over 1 GiB.
- Category set to Workbench
- Target version changed from Bug Triage to 2015-04-01 sprint
- Assigned To set to Peter Amstutz
- Status changed from New to In Progress
- Target version changed from 2015-04-01 sprint to 2015-04-29 sprint
- Status changed from In Progress to Resolved
- Status changed from Resolved to In Progress
- Target version changed from 2015-04-29 sprint to Bug Triage
I'm re-opening this bug.
The collection mentioned above downloads fine, now that we have proxy_buffering disabled. That's roughtly 1.8 GiB.
However - this collection (qr1hi-4zz18-w0t3gbd4u8n5o9h) has a 16.4 fasta file in it, and.... it terminates download after roughly 1 GiB when downloaded through the browser. With arv keep get on a shell node, we get all 16.4 GiB without issues.
- Assigned To deleted (
Peter Amstutz)
- Target version changed from Bug Triage to Deferred
I think we're very likely to deal with this via #5824.
- Status changed from In Progress to Resolved
Now using keep-web and planning to remove workbench download entirely.
Also available in: Atom
PDF