Storing and Organizing Data » History » Revision 24
Revision 23 (Tom Clegg, 04/10/2014 02:13 PM) → Revision 24/33 (Tom Clegg, 04/10/2014 04:28 PM)
h1. Storing and Organizing Data Rough demo outline # Automatic ingest from a POSIX directory to Keep #* Ingestor's access Access to existing staging area (could be (e.g., remote NFS or sshfs mount) share) is arranged ahead of time as an admin/setup task #* 3rd-party's access to Optional(?) User can manage staging area is arranged ahead of time areas hosted inside Arvados #* Ingestor runs in a screen session. Command line parameters provide project (group/folder) ID. #* Someone ("3rd-party") uploads some files to the staging area via SFTP or whatever #* 3rd-party does an API call to "ingest-notify app". {something - ingestor app? directly to arvados api endpoint?}. This might be a short bash script culminating in a curl command. In the API call, the 3rd-party provides a label (e.g., a sample ID) and a list of files, checksums, and an arbitrary "properties" hash containing whatever the 3rd-party wants. #* Ingest-notify app generates a "data in staging area is ready to ingest" event via API server. #* Ingestor waits of a "data in staging area is ready to ingest" notification via API server. #* Ingestor daemon reads the data from the staging area and writes it into Keep (creates Keep; creates one collection per API call made by 3rd-party). uploader #* In Workbench the imported Datasets appear as Collections in the designated project #* After data has been copied into Keep, ingestor deletes the files from the staging area (if @--delete-after@ flag given). (this had better be configurable!). ... # My data gets into the right project as specified by the uploader (API call) #* How is the staging-area ↔ project mapping specified, and how/where is it encoded/stored? ... # Subscribe to notifications (by email and/or Workbench dashboard): when files start/finish uploading; when files are shared with customer; when files are downloaded by third party #* For now, use existing Logs table + automatic logging of create/update/delete operations ... # Move/copy collections between projects (Project RX1234, or Customer X’s files), tag them in destination project with the appropriate string (e.g., sample ID) -- defaulting to existing tag used in source project (e.g., provided at time of upload). #* UI for presenting Groups as Projects/Folders: create, view, rename, share, delete #* UI for copying/moving objects between folders #* How to avoid confusion about "is this one object in two places, or are there two objects?" Note GDocs has a bit of both, "My Drive" / "Shared with me" vs. regular folders ... # “Anyone with this secret link can view/download” mode. Enable, disable, change magic link. Use cases: browser + “wget -r”. #* Perhaps the secret in the secret link is an ApiClientAuthorization token, belonging to the person creating the link, scoped to a single project/collection #* How do we implement "Anonymous user, not logged in"? ... # See log/overview of who has accessed your shared data (incl. “anonymous user” if using secret-link-to-share); when shared/unshared; when each upload started/finished -- for a single project, and across all projects ... # Pilot alternate Workbench dashboard view ... # Use layout/theme from http://startbootstrap.com/templates/sb-admin/index.html ...