Actions
Feature #426
openUse compute cloud for back-end processing
Start date:
11/28/2010
Due date:
% Done:
0%
Estimated time:
Billable:
Estimatedhours:
Hours:
Totalhours:
Resolution:
Story points:
-
Description
We need to modify the background processing code so it can run on a "fresh" node:
- Pre-process reference data (refFlat, hg18.2bit, hg19.2bit) and put it in warehouse storage
- Make mr-get-evidence wrapper:
- in step 0, scan the input, queue 1 jobstep per chromosome, and output the comments/metadata
- fetch/extract the reference data (if not already extracted by previous jobstep)
- grep for the desired chromosome, sort, do the rest of the processing
We should still support single-node installations. For this case we need a mechanism to prevent the server from overtaxing itself if many jobs are submitted at once (e.g., by default, max # concurrent jobs = # cpus).
- Possible solution: Try to flock() one of N lockfiles in /home/trait/lock/slot.X. If all are already locked, wait random# seconds and try again. When a flock succeeds, start the job (pass the lock to the job process, so the lock releases when the process quits).
The xmlrpc server should be replaced with a job queue. The web gui should submit a job by inserting a row into a MySQL table.
The background service (probably running on the same machine as the webgui) will check the queue every few seconds (and when triggered by webgui via named socket or something). For each job in the queue:- Just delete it if we've already started/queued a process for this dataset.
- If cloud processing is available, submit a batch job and note job# J and queuetime
- Start a local job if local processing slots are available and...
- cloud processing is not available, or
- a batch job was submitted for this data set but failed, or
- a batch job was submitted for this data set >30 seconds ago and that job hasn't started yet (cloud is busy)
- If the batch job J for this data set has succeeded:
- Make a symlink or something in {hash}-out/ so the web gui knows the results are available.
- Delete the queue entry.
- If there are some results in {hash}-out/ns.gff.gz etc. from previous analyses, delete them.
- Get a local copy of the get-evidence.json file from the warehouse, but wait to get the other stuff from the warehouse until someone downloads them.
- Copy the uploaded data to the cloud in the background service, while checking for new items in the queue. Make a symlink genotype.gff.archive -> warehouse:///{hash}/input.gff.gz
- If user provides a warehouse:/// url instead of file:///, just make the genotype.gff.archive symlink instead of copying the file to local storage.
Actions