Project

General

Profile

Actions

Feature #13062

open

[SDK] Reduce collection class memory footprint

Added by Peter Amstutz almost 7 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-
Release:
Release relationship:
Auto

Description

Reduce collection class memory footprint in order to reduce the footprint of arv-mount and arvados-cwl-runner in order to run on smaller, cheaper nodes.

General approach: instead of parsing the manifest once and creating Python objects for every directory and file, reparse and create python objects on demand.

Possibly strategy:

  • Initial manifest parsing creates an index that maps each directory path to one or more manifest streams (by offset or by using memoryview) which describe the contents of that directory.
  • When the contents of a Collection or Subcollection is needed, look up the stream(s) associated with the Directory from the index and parse them.
  • Consider doing something similar at individual file level, only load "segments" on demand (may come at cost of higher overhead if it turns out the client is going to visit most of the files in a given directory anyway).
  • Make it possible for a caching strategy to evict loaded collection contents / file segments.

Challenges:

  • Can't cache evict anything that's been returned to the (Python SDK) user unless we can determine it isn't being held (maybe requires reference counting scheme).
Actions #1

Updated by Peter Amstutz almost 7 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz almost 7 years ago

  • Description updated (diff)
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz over 3 years ago

  • Target version deleted (To Be Groomed)
Actions #4

Updated by Lucas Di Pentima almost 2 years ago

  • Release set to 60
Actions

Also available in: Atom PDF