Feature #426: Use compute cloud for back-end processing - GET-Evidence - Arvados

Actions

Copy link

Feature #426

open

Use compute cloud for back-end processing

Added by Ward Vandewege over 14 years ago. Updated almost 14 years ago.

Status:

In Progress

Priority:

Normal

Assigned To:

Tom Clegg

Target version:

Start date:

11/28/2010

Due date:

% Done:

Estimated time:

Billable:

Estimatedhours:

Hours:

Totalhours:

Resolution:

Story points:

Description

We need to modify the background processing code so it can run on a "fresh" node:

Pre-process reference data (refFlat, hg18.2bit, hg19.2bit) and put it in warehouse storage
Make mr-get-evidence wrapper:
- in step 0, scan the input, queue 1 jobstep per chromosome, and output the comments/metadata
- fetch/extract the reference data (if not already extracted by previous jobstep)
- grep for the desired chromosome, sort, do the rest of the processing

We should still support single-node installations. For this case we need a mechanism to prevent the server from overtaxing itself if many jobs are submitted at once (e.g., by default, max # concurrent jobs = # cpus).

Possible solution: Try to flock() one of N lockfiles in /home/trait/lock/slot.X. If all are already locked, wait random# seconds and try again. When a flock succeeds, start the job (pass the lock to the job process, so the lock releases when the process quits).

The xmlrpc server should be replaced with a job queue. The web gui should submit a job by inserting a row into a MySQL table.

The background service (probably running on the same machine as the webgui) will check the queue every few seconds (and when triggered by webgui via named socket or something). For each job in the queue:

Just delete it if we've already started/queued a process for this dataset.
If cloud processing is available, submit a batch job and note job# J and queuetime
Start a local job if local processing slots are available and...
- cloud processing is not available, or
- a batch job was submitted for this data set but failed, or
- a batch job was submitted for this data set >30 seconds ago and that job hasn't started yet (cloud is busy)
If the batch job J for this data set has succeeded:
- Make a symlink or something in {hash}-out/ so the web gui knows the results are available.
- Delete the queue entry.
- If there are some results in {hash}-out/ns.gff.gz etc. from previous analyses, delete them.
- Get a local copy of the get-evidence.json file from the warehouse, but wait to get the other stuff from the warehouse until someone downloads them.

Storage:

Copy the uploaded data to the cloud in the background service, while checking for new items in the queue. Make a symlink genotype.gff.archive -> warehouse:///{hash}/input.gff.gz
If user provides a warehouse:/// url instead of file:///, just make the genotype.gff.archive symlink instead of copying the file to local storage.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

GET-Evidence

Custom queries

Feature #426

Use compute cloud for back-end processing

Updated by Tom Clegg about 14 years ago

Updated by Tom Clegg about 14 years ago

Updated by Tom Clegg almost 14 years ago