Idea #7780
closed[FUSE] Write a FUSE performance pipeline
Description
Write a pipeline that copies data (5-10GB) from the FUSE mount to a local temporary file. Test each of the following methods in sequence in the pipeline:
- shell 'cp' from mount
- Python shutil.copy from mount
- arv-get
- A Python loop that copies from a CollectionReader() to an open file
Write a message after each copy is done so it's possible to collect timing stats up to that point. Then do a checksum on the file and verify that it's not corrupt.
Updated by Brett Smith over 10 years ago
- Description updated (diff)
- Category set to FUSE
Updated by Peter Amstutz over 10 years ago
1st run
arv-get: 54.547s
collection: 39.7725520134
cp: 39.635s
shutil: 37.7655129433
Updated by Peter Amstutz over 10 years ago
2nd run
arv-get: 1m41.320s
collection: 38.4127969742
cp: 40.311s
shutil: 37.6932270527
3rd & 4th run
arv-get: 40.937s
arv-get: 41.551s
Updated by Peter Amstutz over 10 years ago
Seems to be averaging 45 MiB/s
First one (arv-get) seems to be affected by cache freshness issues, running it last gives better numbers.
cpu utilization:
arv-get: cpu 10.5500 user 10.1400 sys
collection: cpu 12.5100 user 10.5600 sys
cp: cpu 0.0200 user 1.7600 sys
shutil: cpu 2.5400 user 2.3600 sys
I think this just means arv-mount's CPU utilization isn't being charged to the copy commands like it is to arv-get and CollectionReader.
Updated by Peter Amstutz over 10 years ago
strace -c arv-get 0b2b39e4da4b51d40dcfc12f47af4e04+1249/lobSTR_ref.fasta . % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 31.94 1.363753 2 808788 clock_gettime 26.46 1.129736 10 115231 recvfrom 23.16 0.988879 275 3601 write 9.10 0.388444 2 230682 poll
Not sure why it's calling clock_gettime 800k times. Also recvfrom() is only reading 16k at a time which seems too small for the data transfer rate we're trying to achieve.
Updated by Ward Vandewege over 10 years ago
Peter Amstutz wrote:
[...]
Not sure why it's calling
clock_gettime800k times.
Known issue, you need to set an env var. See
http://serverfault.com/questions/452833/how-to-resolve-high-cpu-excessive-stat-etc-localtime-and-clock-gettimeclo
Updated by Ward Vandewege over 10 years ago
Ward Vandewege wrote:
Peter Amstutz wrote:
[...]
Not sure why it's calling
clock_gettime800k times.Known issue, you need to set an env var. See
Hmm, maybe I spoke too soon. We set TZ to avoid the excessive stat'ing of /etc/localtime in some places on our clusters. But it doesn't seem to help with all the clock_gettime calls.
Updated by Peter Amstutz over 10 years ago
The inner receiver loop (which is presumably all libcurl or pycurl) looks like this:
poll([{fd=11, events=POLLIN}], 1, 1000) = 1 ([{fd=11, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {4137158, 906951004}) = 0
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
recvfrom(11, "TTTTCAGGGTTTCACTCTGTCACCCAGGCTGG"..., 16384, 0, NULL, NULL) = 16384
clock_gettime(CLOCK_MONOTONIC, {4137158, 907006204}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907018404}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907030204}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907042004}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907053504}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907065504}) = 0
poll([{fd=11, events=POLLIN}], 1, 1000) = 1 ([{fd=11, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {4137158, 907091504}) = 0
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
recvfrom(11, "TTCTCAACATTTCGCCTAGCAACTCCCCCAGG"..., 16384, 0, NULL, NULL) = 16384
clock_gettime(CLOCK_MONOTONIC, {4137158, 907144704}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907157004}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907168804}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907192504}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907204004}) = 0
clock_gettime(CLOCK_MONOTONIC, {4137158, 907215904}) = 0
poll([{fd=11, events=POLLIN}], 1, 1000) = 1 ([{fd=11, revents=POLLIN}])
Updated by Nico César over 10 years ago
https://workbench.c97qk.arvadosapi.com/pipeline_instances/c97qk-d1hrv-ary5ydq9enxrs6s#
created a 2nd keep server and testing now
Updated by Nico César over 10 years ago
I ran this job and works
https://workbench.c97qk.arvadosapi.com/pipeline_instances/c97qk-d1hrv-ary5ydq9enxrs6s#
c97qk (azure blob store, 2 keepstore servers): throwaway 149.83417511 arv-get 106.32137084 CollectionReader 104.115087986 cp 105.585179806 shutil 96.5133240223
LGTM for what is worth.
As a whishlist I'd like to add more data to the final table so we have the possibility to compare over time like:
about the run itself¶
- bytes sent
- cpu (us sy id wa st if possible)
- highest usage of memory (whatever is the easiest way to measure this.)
- io activity (whatever is the easiest way to measure this.)
- latency to the rest of the components (like an average of 5 pings to API, keep servers, etc)
about the environment¶
- keep server/services available
- version of arvados components
- /proc/cpu_info and lshw or something similar to gives us a clue which cloud and size we're dealing with (ideally API server should provide all this details including the size of all the components)
- keep block size
there are more to add but I'll be happy if just start with that and collect the info over time
Updated by Peter Amstutz over 10 years ago
"throwaway" task
4xphq 55.6503009796, 56.6268150806, 58.0175321102
c97qk 106.220340014, 85.2707979679, 106.330720901, 113.038635015
qr1hi 38.397521019 101s
Updated by Peter Amstutz over 10 years ago
4xphq
warmup: 56.9406909943
arvget: 56.6816000938
qr1hi:
warmup: 434.597276211
arvget: 40.413373947