Crunch » History » Version 15
Ward Vandewege, 11/06/2017 07:44 PM
| 1 | 14 | Ward Vandewege | h1. Crunch - container orchestration |
|---|---|---|---|
| 2 | 1 | Anonymous | |
| 3 | 14 | Ward Vandewege | Arvados has a robust container orchestration system called 'Crunch', which executes CWL workflows while maintaining provenance and reproducibility. |
| 4 | 1 | Anonymous | |
| 5 | h2. Design Goals |
||
| 6 | |||
| 7 | Notable design goals and features include: |
||
| 8 | |||
| 9 | 6 | Tom Clegg | * Make use of multiple cores and nodes to produce results faster |
| 10 | * Integrate with [[Keep]] and git repositories to maintain provenance |
||
| 11 | * Use off-the-shelf software tools in distributed computations |
||
| 12 | * Efficient over a wide range of problem sizes |
||
| 13 | * Maximum flexibility of programming language choice |
||
| 14 | * Maximum flexibility of execution environment |
||
| 15 | * Tools for building reusable pipelines |
||
| 16 | * Lower entry barrier for users |
||
| 17 | 1 | Anonymous | |
| 18 | 15 | Ward Vandewege | h2. Benefits of Crunch |
| 19 | 1 | Anonymous | |
| 20 | 15 | Ward Vandewege | Although some of the workflow and provenance features in Arvados could theoretically be implemented using Hadoop MapReduce, there are distinct benefits to Crunch: |
| 21 | 1 | Anonymous | |
| 22 | 15 | Ward Vandewege | * *Provenance and Reproducibility* - Like Keep, the Arvados distributed file system, Crunch is designed to automate tracking the origin of result data, reproducing complex pipelines, and comparing pipelines to one another. |
| 23 | 6 | Tom Clegg | |
| 24 | 15 | Ward Vandewege | * *Performance* - Most genomics problems are embarrassingly parallel and can benefit from horizontal scaling. In the cloud, Crunch can deliver cost-effective performance for genomics related analyses by automatically adjusting the available compute resources to the workload. |
| 25 | 6 | Tom Clegg | |
| 26 | 15 | Ward Vandewege | * *Standardization* - "Common Workflow Language (CWL)":http://commonwl.org is the workflow description standard in bioinformatics. It is the native workflow language in Crunch. |