Glossary of Arvados Terms » History » Version 1
Anonymous, 04/17/2013 02:45 PM
| 1 | 1 | Anonymous | h1. Glossary of Arvados Terms |
|---|---|---|---|
| 2 | |||
| 3 | *collection* - snapshot of a set of data files (file names and contents). Analogous to a directory tree, with the additional property that it reliably identifies the data itself, not just symbolic names. |
||
| 4 | |||
| 5 | *content address* - a cryptographic digest of a data blob that can be used to determine where the data is stored and to verify the file. |
||
| 6 | |||
| 7 | *data manager* - a component that assists Keep servers in enforcing site policies and monitors the state of the storage facility as a whole; we expect that the Data Manager will also broker backup and archival services and interfaces to other external data stores. |
||
| 8 | |||
| 9 | *job dispatcher* - invokes jobs from the queue, as compute nodes become available. |
||
| 10 | |||
| 11 | *job script* - a program written and organized to be implemented using MapReduce for distributed computation. |
||
| 12 | |||
| 13 | *job task* - a portion of a Job that can be computed asynchronously and independently of other Job Tasks; could also be called a “MapReduce task” or “MapReduce step”. |
||
| 14 | |||
| 15 | *job* - an execution of a MapReduce program (Job Script). Has specific inputs, outputs, start time, finish time, etc. |
||
| 16 | |||
| 17 | *Keep* - a content addressable distributed file system. |
||
| 18 | |||
| 19 | *manifest* - a plain-text encoding of a collection. Lists filenames, sizes, and data block locators (hashes). Reference: http://en.wikipedia.org/wiki/Manifest_file. |
||
| 20 | |||
| 21 | *MapReduce Engine* - supervises a job from start to finish: sets up execution environment, queues and executes tasks, monitors task/node failures, reports statistics. |
||
| 22 | |||
| 23 | *pipeline component* - a portion of a pipeline template (or instance) that describes criteria for selecting or submitting a job; can also refer to a specific Job (e.g., in a completed pipeline instance). |
||
| 24 | |||
| 25 | *pipeline instance* - the act or record of applying a pipeline template to a specific set of inputs; generally, a pipeline instance refers to the UUIDs of jobs that have been run to satisfy the pipeline components. |
||
| 26 | |||
| 27 | *pipeline manager* - looks up (and submits, as needed) jobs to satisfy pipeline components, monitors the jobs as they run, and handles dependencies (e.g., wait for job A to complete, and use its output as job B’s input). |
||
| 28 | |||
| 29 | *pipeline template* - a pattern that describes the relationships among the component Jobs: for example, the template specifies that job A's output is job B's input; a pipeline template is analogous to a Makefile. |
||
| 30 | |||
| 31 | *provenance* - the origin of data and the details of how a data set was computed (including processes and source data). |
||
| 32 | |||
| 33 | *Workbench* - the browser-based visual tools for users to interface with Arvados capabilities; Workbench mirrors many of the same functions accessible through the command line interface (CLI). |