Story #3640: [SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache. - Arvados

Actions

Copy link

Story #3640

open

[SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.

Added by Tom Clegg over 10 years ago. Updated almost 3 years ago.

Status:

New

Priority:

Normal

Assigned To:

Category:

Keep

Target version:

Start date:

Due date:

% Done:

Estimated time:

Story points:

2.0

Description

Background:

arv-mount has a block cache, which improves performance when the same blocks are read multiple times. However:

Currently a new arv-mount process is started for each Crunch task execution. This means tasks don't share a cache, even if they're running at the same time.
In the common case where multiple crunch tasks run at the same time and use the same data, we have multiple arv-mount processes each retrieving and caching its own copy of the same data blocks.

Proposed improvement:

Use large swap on worker nodes (preferably SSD). (We already do this for other reasons.)
Set up a large tmpfs on worker nodes and use it as crunch job scratch space. (This already gets cleared at the beginning of a job to avoid leakage between jobs/users.)
Use a directory in that tmpfs as an arv-mount cache. This makes it feasible to use a large cache size, and makes it easy to share the cache between multiple arv-mount processes.

Implementation notes:

Rely on unix permissions for cache privacy. (Warn if the cache dir's mode & 0007 != 0, but go ahead anyway: there will be cases where that would be useful and not dangerous.)
Use flock() to avoid races and duplicated effort. (If arv-mount 1 is writing a block to the cache, then arv-mount 2 should wait for arv-mount 1 to finish then read from the cache, rather than fetch its own copy.)
Do not clean up cache dir at start/exit, at least by default (the general idea is to share with past/future arv-mount procs). An optional --cache-clear-atexit flag would be nice to have.
Measuring/limiting cache size could be interesting
Delete & replace upon finding a corrupt/truncated cache entry

Integration:

The default Keep mount on shell nodes should use a filesystem cache, assuming there is an appropriate filesystem for it (i.e., something faster than network: tmpfs, SSD, or at least a disk with async/barriers=0).
crunch-job should create a per-job temp dir on each node during the "install" phase, and point all arv-mount processes to it.

Related issues 5 (2 open — 3 closed)

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Description updated (diff)
Category set to Keep

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Target version set to Arvados Future Sprints

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Subject changed from [FUSE] Add runtime option to use a filesystem directory block cache as an alternative to RAM cache. to [FUSE] Add runtime option to arv-mount to use a filesystem directory block cache as an alternative to RAM cache.

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Subject changed from [FUSE] Add runtime option to arv-mount to use a filesystem directory block cache as an alternative to RAM cache. to [FUSE] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Subject changed from [FUSE] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache. to [SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Ward Vandewege almost 4 years ago

Target version deleted (~~Arvados Future Sprints~~)

Actions

Copy link

Updated by Peter Amstutz about 3 years ago

Has duplicate Feature #18842: Local disk keep cache for Python SDK/arv-mount added

Actions

Copy link

#10

Updated by Peter Amstutz almost 3 years ago

Implementation brainstorm.

Build this feature around mmap()

When fetching a block, first check the memory cache, then check the disk cache, then fetch it from keep

When fetching a block from keep, keep it in memory and start asynchronously writing it out to disk
- We want to be able to serve reads immediately without waiting for the disk cache machinery
Once it has been written to disk it can be ejected from the memory cache
When we find a block in the disk cache, open it and use mmap(), this gives us something that behaves like a memory buffer
Separately keep track of open file descriptors and close ones that haven't been used recently
Separately keep track of space used by blocks on disk and delete least recently used ones

Benefits of the mmap approach:

existing code for reassembling files from blocks mostly doesn't have to change
avoid making a read() syscall in the happy case (no page fault)
able to leverage the kernel's filesystem cache to balance between user process memory & cache memory
the file that's just been written and then re-opened might even still be in the file system cache, which may even avoid blocking on disk activity
can have a much larger default cache, users don't have to think about the Arvados cache

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Watchers (2)

Story #3640

[SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Ward Vandewege almost 4 years ago

Updated by Peter Amstutz about 3 years ago

Updated by Peter Amstutz almost 3 years ago