Actions
Feature #8707
openArvados job: download data from remote site into Keep
Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Third party integration
Target version:
Start date:
03/15/2016
Due date:
% Done:
100%
Estimated time:
(Total: 0.00 h)
Story points:
1.0
Description
Updated by Tom Clegg almost 9 years ago
- Category set to Third party integration
- Assigned To set to Tom Clegg
Updated by Tom Clegg over 8 years ago
8707-download @ db7bd2a8f4981c079ced6c09646ac297790326ae
- failure due to successful download with right size but wrong md5sum: https://crvr.se/su92l-8i9sb-ful8qhzowkshfoq
- success: https://crvr.se/su92l-8i9sb-aizw0cupzxafowf
Updated by Brett Smith over 8 years ago
Reviewing db7bd2a. This is good to merge, these are all just "idiomatic Python" nits that you can take or leave as you like.
cStringIO provides the same API as StringIO with better performance. You can switch to it with a one-line change by changing your import to import cStringIO as StringIO
.
It seems a little odd that you open the URL, then check its scheme. Maybe move that up? You might also consider saving the result of urlparse.urlparse()
and reusing it, but that's really small potatoes.
Your download loop can be written a little DRYer as:
with open(outpath, 'w') as outfile:
for chunk in iter(lambda: httpresp.read(BUFFER_SIZE), ''):
outfile.write(chunk)
got_md5.update(chunk)
got_size = outfile.tell()
Thanks.
Updated by Tom Clegg over 8 years ago
All of that sounds better, thanks. I was torn between the two uglies -- while-True-if-cond-break
and duplicating the read()
-- the iter
solution is just what I was wishing for.
Updated by Brett Smith over 8 years ago
Actions