Story #3640
Updated by Tom Clegg over 10 years ago
Background:
arv-mount has a block cache, which improves performance when the same blocks are read multiple times. However:
* Currently a new arv-mount process is started for each Crunch task execution. This means tasks don't share a cache, even if they're running at the same time.
* In the common case where multiple crunch tasks run at the same time and use the same data, we have multiple arv-mount processes each retrieving and caching its own copy of the same data blocks.
Proposed improvement:
* Use large swap on worker nodes (preferably SSD). (We already do this for other reasons.)
* Set up a large tmpfs on worker nodes and use it as crunch job scratch space. (This already gets cleared at the beginning of a job to avoid leakage between jobs/users.)
* Use a directory in that tmpfs as an arv-mount cache. This makes it feasible to use a large cache size, and makes it easy to share the cache between multiple arv-mount processes.
Implementation notes:
* Rely on unix permissions for cache privacy. (Perhaps warn if permissions look wrong.)
* Use flock() to avoid races and duplicated effort. (If arv-mount 1 is writing a block to the cache, then arv-mount 2 should wait for arv-mount 1 to finish then read from the cache, rather than fetch its own copy.)
* Do not clean up cache dir at start/exit (the idea is to share with past/future arv-mount procs), but perhaps offer a @--clear@ mode?
* Measuring/limiting cache size could be interesting
* Delete & replace upon finding a corrupt/truncated cache entry