Idea #3015: Make gatk3 pipeline template - Arvados

Actions

Copy link

Idea #3015

closed

Make gatk3 pipeline template

Added by Tom Clegg almost 12 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Peter Amstutz

Category:

Target version:

2014-07-16 Sprint

Start date:

06/24/2014

Due date:

Story points:

3.0

Description

Create a project
Download and add the appropriate reference & example datasets to the project
Make a docker image with all of the relevant redistributable software pre-installed
Make a "pirs" crunch script and use it to generate the simulation dataset based on hg19 chr1
Make a "Single sample SNV with bwa and gatk" pipeline with no parallel/asynchronous tasks
Time permitting, make another pipeline that splits the inputs as described below in order to get faster turnaround time when multiple nodes are available.

The attached script and the existing GATK exome pipeline should be helpful. Notes:

Use FUSE mount for inputs
GATK3 (like attached) not GATK2 (like existing pipeline)
Use a docker image with redistributable tools pre-installed, assuming this makes things easier (but not GATK itself - continue to pass this tarball as a job input)
Use the file-select script to get appropriate bits from the GATK bundle (which we should have an entire copy of in our project), rather than downloading individual files needed.
Existing pipeline provides clues (not necessarily all correct with latest tool versions) about which tools are capable of reading/writing pipes rather than regular files.

Notes about parallelizing:

We can split the FASTQ into many chunks as we want, however after the mapping, we should merge the alignments from one sample into single SAM/BAM file to stack the reads on each genome position. Then we split the SAM/BAM file again by chromosome. So roughly speaking we can get 24 or 25 BAM fragments then all downstream steps could be applied on these chromosome based BAM fragments. At last, probably after annotation, we merge fragment files into one final file. To increase parallelism we can even split the BAM on positions where have very low/no coverage.

Files

Single_Sample_SNV_Pipeline.txt (6.82 KB) Single_Sample_SNV_Pipeline.txt

Tom Clegg, 06/18/2014 01:04 AM

Subtasks 5 (0 open — 5 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Tom Clegg almost 12 years ago

Target version set to 2014-07-16 Sprint

Actions

Copy link

Updated by Tom Clegg almost 12 years ago

File Single_Sample_SNV_Pipeline.txt Single_Sample_SNV_Pipeline.txt added

Actions

Copy link

Updated by Tom Clegg almost 12 years ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz almost 12 years ago

Assigned To set to Peter Amstutz

Actions

Copy link

Updated by Peter Amstutz almost 12 years ago

Ongoing notes:

Can't upload with arv-put to 4xphq. Got "entity too large" error from nginx.
if I upload something with arv-put it doesn't go into any folders by default, and there's no link to the collections table any more

Actions

Copy link

Updated by Peter Amstutz over 11 years ago

No modal chooser for individual files. Workaround: select the files in a different window using the paperclip, then they show up in selection dropdown.
Pipelines created from pipeline templates don't get added to the same folder.
Job parameters passed to docker need to be JSON encoded
An "arv migrate" command for copying objects and collections between arvados instances seems like it would be a good idea.
if you try to queue a job while another one is running and it's running locally (not using slurm), it tries to run it immediately and then fails because the job directory is locked

Actions

Copy link

Updated by Tom Clegg over 11 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Peter Amstutz over 11 years ago

Status changed from In Progress to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Idea #3015

Make gatk3 pipeline template

Updated by Tom Clegg almost 12 years ago

Updated by Tom Clegg almost 12 years ago

Updated by Tom Clegg almost 12 years ago

Updated by Peter Amstutz almost 12 years ago

Updated by Peter Amstutz almost 12 years ago

Updated by Peter Amstutz over 11 years ago

Updated by Tom Clegg over 11 years ago

Updated by Peter Amstutz over 11 years ago