Storing and Organizing Data » History » Version 32
Tom Clegg, 05/07/2014 02:27 PM
1 | 7 | Tom Clegg | h1. Storing and Organizing Data |
---|---|---|---|
2 | 3 | Tom Clegg | |
3 | 7 | Tom Clegg | Rough demo outline |
4 | 3 | Tom Clegg | |
5 | 8 | Tom Clegg | # Automatic ingest from a POSIX directory to Keep |
6 | 24 | Tom Clegg | #* Ingestor's access to staging area (could be remote NFS or sshfs mount) is arranged ahead of time |
7 | #* 3rd-party's access to staging area is arranged ahead of time |
||
8 | 25 | Tom Clegg | #* Ingestor runs in a screen session. Command line parameters provide project (group/folder) ID and a tag that indicates "this is for *me* to ingest". |
9 | 1 | #* Someone ("3rd-party") uploads some files to the staging area via SFTP or whatever |
|
10 | 24 | Tom Clegg | #* 3rd-party does an API call to "ingest-notify app". This might be a short bash script culminating in a curl command. In the API call, the 3rd-party provides a label (e.g., a sample ID) and a list of files, checksums, and an arbitrary "properties" hash containing whatever the 3rd-party wants. |
11 | #* Ingest-notify app generates a "data in staging area is ready to ingest" event via API server. |
||
12 | #* Ingestor waits of a "data in staging area is ready to ingest" notification via API server. |
||
13 | #* Ingestor reads the data from the staging area and writes it into Keep (creates one collection per API call made by 3rd-party). |
||
14 | 26 | Tom Clegg | #* Ingestor (or arv-put on behalf of ingestor?) makes API calls while working, to indicate progress (bytes done/todo). @arvados.v1.logs.create(object_uuid=uuid_of_upload_object)@ |
15 | 9 | Tom Clegg | #* In Workbench the imported Datasets appear as Collections in the designated project |
16 | 24 | Tom Clegg | #* After data has been copied into Keep, ingestor deletes the files from the staging area (if @--delete-after@ flag given). |
17 | 17 | Tom Clegg | ... |
18 | # My data gets into the right project as specified by the uploader (API call) |
||
19 | #* How is the staging-area ↔ project mapping specified, and how/where is it encoded/stored? |
||
20 | ... |
||
21 | 1 | # Subscribe to notifications (by email and/or Workbench dashboard): when files start/finish uploading; when files are shared with customer; when files are downloaded by third party |
|
22 | 27 | Tom Clegg | #* For now, use existing Logs table + automatic logging of create/update/delete operations + "progress" event from arv-put (see above) |
23 | #* "Show project" page shows recent activity: one progress bar for each unfinished upload, one entry for each start/finish event. |
||
24 | 28 | Tom Clegg | #* Dashboard page shows recent activity from all of my projects. |
25 | 17 | Tom Clegg | ... |
26 | 1 | # Move/copy collections between projects (Project RX1234, or Customer X’s files), tag them in destination project with the appropriate string (e.g., sample ID) -- defaulting to existing tag used in source project (e.g., provided at time of upload). |
|
27 | 17 | Tom Clegg | #* UI for presenting Groups as Projects/Folders: create, view, rename, share, delete |
28 | #* UI for copying/moving objects between folders |
||
29 | #* How to avoid confusion about "is this one object in two places, or are there two objects?" Note GDocs has a bit of both, "My Drive" / "Shared with me" vs. regular folders |
||
30 | ... |
||
31 | 29 | Tom Clegg | # Share project with other users/groups |
32 | ... |
||
33 | 1 | # “Anyone with this secret link can view/download” mode. Enable, disable, change magic link. Use cases: browser + “wget -r”. |
|
34 | #* Perhaps the secret in the secret link is an ApiClientAuthorization token, belonging to the person creating the link, scoped to a single project/collection |
||
35 | 17 | Tom Clegg | #* How do we implement "Anonymous user, not logged in"? |
36 | ... |
||
37 | 6 | Tom Clegg | # See log/overview of who has accessed your shared data (incl. “anonymous user” if using secret-link-to-share); when shared/unshared; when each upload started/finished -- for a single project, and across all projects |
38 | 23 | Tom Clegg | ... |
39 | 30 | Tom Clegg | # Pilot alternate Workbench group/dashboard view |
40 | 23 | Tom Clegg | ... |
41 | 31 | Tom Clegg | |
42 | |||
43 | h2. Retrospective notes |
||
44 | |||
45 | * Went well - still some merge-race at the end |
||
46 | * Lots of branches going in |
||
47 | * Not a lot of merge conflicts |
||
48 | * Big spec change (rejecting "ingestor" story in favor of future "remote arv-put") |
||
49 | * Some in-sprint deployment dependency stuff (crunch+docker, websockets) |
||
50 | 32 | Tom Clegg | * Please tag commits with story numbers. Use "refs #1234" for merges. (Use "refs #1234" for individual commits too?) |
51 | * Consider extracting a task into a story if it grows into its own thing (e.g., token handling as part of collection sharing) |
||
52 | * In sprint review, include 2 more agenda items: summary of things not done + high-level overview of next sprint |