Crunch » History » Version 15
Ward Vandewege, 11/06/2017 07:44 PM
1 | 14 | Ward Vandewege | h1. Crunch - container orchestration |
---|---|---|---|
2 | 1 | Anonymous | |
3 | 14 | Ward Vandewege | Arvados has a robust container orchestration system called 'Crunch', which executes CWL workflows while maintaining provenance and reproducibility. |
4 | 1 | Anonymous | |
5 | h2. Design Goals |
||
6 | |||
7 | Notable design goals and features include: |
||
8 | |||
9 | 6 | Tom Clegg | * Make use of multiple cores and nodes to produce results faster |
10 | * Integrate with [[Keep]] and git repositories to maintain provenance |
||
11 | * Use off-the-shelf software tools in distributed computations |
||
12 | * Efficient over a wide range of problem sizes |
||
13 | * Maximum flexibility of programming language choice |
||
14 | * Maximum flexibility of execution environment |
||
15 | * Tools for building reusable pipelines |
||
16 | * Lower entry barrier for users |
||
17 | 1 | Anonymous | |
18 | 15 | Ward Vandewege | h2. Benefits of Crunch |
19 | 1 | Anonymous | |
20 | 15 | Ward Vandewege | Although some of the workflow and provenance features in Arvados could theoretically be implemented using Hadoop MapReduce, there are distinct benefits to Crunch: |
21 | 1 | Anonymous | |
22 | 15 | Ward Vandewege | * *Provenance and Reproducibility* - Like Keep, the Arvados distributed file system, Crunch is designed to automate tracking the origin of result data, reproducing complex pipelines, and comparing pipelines to one another. |
23 | 6 | Tom Clegg | |
24 | 15 | Ward Vandewege | * *Performance* - Most genomics problems are embarrassingly parallel and can benefit from horizontal scaling. In the cloud, Crunch can deliver cost-effective performance for genomics related analyses by automatically adjusting the available compute resources to the workload. |
25 | 6 | Tom Clegg | |
26 | 15 | Ward Vandewege | * *Standardization* - "Common Workflow Language (CWL)":http://commonwl.org is the workflow description standard in bioinformatics. It is the native workflow language in Crunch. |