API HistoricalForcasting data for CR » History » Version 2
Nico César, 06/18/2020 08:44 AM
1 | 1 | Nico César | h1. API Historical/Forcasting data for CR |
---|---|---|---|
2 | |||
3 | Goal: create a pipeline forecaster and visualization for historical data. This should expose APIs that can be used in the ContainerRequest visualization and |
||
4 | also could be use to provide extra information for the current running CR |
||
5 | |||
6 | Glossary: |
||
7 | |||
8 | 2 | Nico César | * Checkpoint: is a generic name that currently corresponds to a step name followed by a "family". The reason behind having a family is to cluster all the executions (including the scattered steps that have the pattern: name_2, name_3,..., name_229) for runs that the token has access to with similar characteristics. |
9 | 1 | Nico César | |
10 | 2 | Nico César | * Family: A common name like "gatk" or "haplotypecaller" can be used as a step name. In practice 2 executions would create 2 different populations (in terms of checkpoints) depending on the parameters of the CommandLineTool |
11 | 1 | Nico César | |
12 | 2 | Nico César | * Datapoint: a concrete data that can be plotted as historical data. Currently we're bounding together the container request and the associated container to have a unified view of the times involved. This should not get confused with forecast data since can be used separately |
13 | 1 | Nico César | |
14 | h2. API |
||
15 | |||
16 | GET /container-request/aaaaa-xvhdp-123456789abc/checkpoints |
||
17 | |||
18 | Output: |
||
19 | |||
20 | <pre> |
||
21 | { |
||
22 | "checkpoints": [ |
||
23 | { |
||
24 | "name": "merge-tilelib@family22", |
||
25 | "dependencies": [ |
||
26 | "createsglf" |
||
27 | ], |
||
28 | "time_average": 8254.534873, |
||
29 | "time_count": 1, |
||
30 | "time_min": 8254.534873, |
||
31 | "time_min_comment": "duration:merge-tilelib#su92l-dz642-cc7799yfwi5jmd9", |
||
32 | "time_max": 8254.534873, |
||
33 | "time_max_comment": "duration:merge-tilelib#su92l-dz642-cc7799yfwi5jmd9" |
||
34 | }, |
||
35 | { |
||
36 | "name": "createsglf@family22", |
||
37 | "dependencies": [], |
||
38 | "time_average": 4741.290203, |
||
39 | "time_count": 58, |
||
40 | "time_min": 82.138309, |
||
41 | "time_min_comment": "duration:createsglf_57#su92l-dz642-3u3g4bq1yh4pqje", |
||
42 | "time_max": 5818.898387, |
||
43 | "time_max_comment": "duration:createsglf_8#su92l-dz642-8d094xhqciin5m2" |
||
44 | }, |
||
45 | ... |
||
46 | ], |
||
47 | "time_average": <average time for the CR family>, |
||
48 | </pre> |
||
49 | |||
50 | |||
51 | GET /container-request/aaaaa-xvhdp-123456789abc/datapoints |
||
52 | |||
53 | Output: |
||
54 | |||
55 | <pre> |
||
56 | [ |
||
57 | { |
||
58 | "step_name": "createsglf", |
||
59 | "start_1": "2020-01-15 19:49:34.213 +0000", |
||
60 | "end_1": "2020-01-15 21:19:39.001 +0000", |
||
61 | "start_2": "2020-01-15 19:54:44.864 +0000", |
||
62 | "end_2": "2020-01-15 21:19:39.001 +0000", |
||
63 | "reuse": false, |
||
64 | "status": "completed", |
||
65 | "legend": "<p>createsglf</p><p>Container Request: <a href=\"https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-zfc3ffxk3slmkzv\">su92l-xvhdp-zfc3ffxk3slmkzv</a></p><p>Container duration: 1h24m54.137122s\n</p>" |
||
66 | }, |
||
67 | { |
||
68 | "step_name": "createsglf_2", |
||
69 | "start_1": "2020-01-15 19:49:34.288 +0000", |
||
70 | "end_1": "2020-01-15 21:29:11.399 +0000", |
||
71 | "start_2": "2020-01-15 19:54:51.275 +0000", |
||
72 | "end_2": "2020-01-15 21:29:11.399 +0000", |
||
73 | "reuse": false, |
||
74 | "status": "completed", |
||
75 | "legend": "<p>createsglf_2</p><p>Container Request: <a href=\"https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-py99va9hnvuxzp5\">su92l-xvhdp-py99va9hnvuxzp5</a></p><p>Container duration: 1h34m20.123849s\n</p>" |
||
76 | }, |
||
77 | .... |
||
78 | </pre> |
||
79 | |||
80 | GET /container-request/aaaaa-xvhdp-123456789abc/workflow-dot |
||
81 | |||
82 | Output: |
||
83 | |||
84 | <pre> |
||
85 | digraph cwlgraph { |
||
86 | rankdir=LR; |
||
87 | graph [compound=true]; |
||
88 | |||
89 | subgraph cluster_0 { |
||
90 | label="#createcgf-wf.cwl"; |
||
91 | node [style=filled]; |
||
92 | shape=box |
||
93 | style="filled"; |
||
94 | color="#dddddd"; |
||
95 | "#createcgf-wf.cwl" [ label = "#createcgf-wf.cwl", style = invis ]; |
||
96 | .... |
||
97 | </pre> |
||
98 | |||
99 | |||
100 | h2. Frontend |
||
101 | |||
102 | Dot file can be rendered with https://domparfitt.com/graphviz-react/ we already tested some big files |
||
103 | |||
104 | h2. Schema and queries on the postgres DB |
||
105 | |||
106 | TODO: Outline the transformation from the current local leveldb cache to some per-user caching table. |
||
107 | TODO: list the queries to INSERT and SELECT the data for a particular checkpoint. |
||
108 | |||
109 | |||
110 | h2. Permissions |
||
111 | |||
112 | One concern is permissions. we'll behave similar to everything else in Arvados: if it's a CR that the token doesn't have access to, then is a 404. This includes the idea of "sumarized data" as in the historical time and prices of the CRs |