Bug #17118
closedPossible arv-put hang when writing collection
100%
Description
Reported by user that arv-put would upload a directory of files, and then sometimes hang before writing the collection. However, the checkpoint file was written, so canceling the process and re-running arv-put would create the collection without waiting for a re-upload.
Inspect the code and see if there are any places that seems vulnerable to a deadlock.
Here's the follow-up (https://support.curii.com/rt/Ticket/Display.html?id=119)
I would like to report a possible bug/improvement for the arv-put
command. We ran into some issues when using arv-put where it would die
silently without giving any output whatsoever. We have now traced it
to the fact that the arv-put cmd essentially runs out of memory (or
uses a huge amount of memory).
The setup:
1. A folder containing a number of files (< 1500) with a total folder
size of 145GB. This entire folder is to be uploaded into Arvados.
2. We run it via Gitlab as a Runner on a Virtual Machine with 16GB of RAM.
3. The arv-put cmd we use:
arv-put --no-follow-links --no-resume --exclude 'Thumbnail_Images/*'
--exclude done.txt --project-uuid arkau-j7d0g-6a3em925c3yvx9q --name
Overnight1 /isilon/nrd_hca/Overnight1/
Output:
1. The script silently dies, no error message, no other output.
We have done extensive testing and checking and initially, the arv-put
cmd just died silently without giving any error message whatsoever.
After some digging, it turns out that arv-put cmd essentially eats up
all the memory on the machine and is then killed. We tried to change
it so that arrv-put can only use 1 thread but the outcome is the same.
See the attached images for the output from 'top' when trying to
upload the 145GB folder. We have plans in the future to upload folders
with around 750GB of data and if arv-put cannot handle this or needs a
huge amount of memory to do this, we will need to reconsider our
workflows.
We have a couple of questions:
1. What is the relationship between the size of the folder to be
uploaded and the amount of memory arv-put will use?
2. Is there a way to estimate how much memory would be needed for a
certain folder/size of data?
3. Is there a way to make arv-put fail gracefully in cases like this?
4. If known, what is the reason that arv-put uses so much memory?