LobSTR tutorial » History » Version 14
Jonathan Sheffi, 10/16/2014 04:02 PM
1 | 1 | Bryan Cosca | h1. Running lobSTR v.3 using Arvados |
---|---|---|---|
2 | |||
3 | 7 | Bryan Cosca | This tutorial demonstrates how to run the lobSTR pipeline using the example that the Erlich Lab provides at their "page":http://lobstr.teamerlich.org/. LobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. The lobSTR publication is available here: "Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012 April 22.":http://genome.cshlp.org/content/early/2012/04/19/gr.135780.111.abstract This tutorial introduces the following Arvados features: |
4 | 1 | Bryan Cosca | |
5 | * How to run lobSTR v.3 using Arvados |
||
6 | * How to access your pipeline results. |
||
7 | 7 | Bryan Cosca | * How to browse and select your input data for lobSTR and submit re-run the pipeline. |
8 | 1 | Bryan Cosca | |
9 | 6 | Jonathan Sheffi | # Start at the "Curoverse":https://curoverse.com/ website and click Log In at the top. We currently support all Google / Google Apps accounts for authentication. By simply choosing a Google-based account, your account will be automatically created and redirect to the "Arvados Workbench":https://workbench.qr1hi.arvadosapi.com/. |
10 | # In the *Active pipelines* panel, click on the *Run a pipeline...* button. Doing so opens a dialog box titled *Choose a pipeline to run*. |
||
11 | # Select *lobstr v.3* and click the *Next: choose inputs* button. Doing so loads a new page to supply the inputs for the pipeline. |
||
12 | 7 | Bryan Cosca | # The default inputs from the lobSTR source code repository are already pre-loaded. Click on the *Run* button. The page updates to show you that the pipeline has been submitted to run on the Arvados cluster. |
13 | 6 | Jonathan Sheffi | # After the pipeline starts running, you can track its progress by watching log messages from jobs. This page refreshes automatically. You will see a complete label under the job the column when the pipeline completes successfully. The current run time of the job in CPU and clock hours is also displayed. You can view individual job details by clicking on the job name. |
14 | 2 | Bryan Cosca | # Once the job is finished, the output can be viewed to the right of the run time. |
15 | 6 | Jonathan Sheffi | # Click on the download button to the right of the file to download your results, or the magnifying glass to quickly view your results. |
16 | 1 | Bryan Cosca | |
17 | h2. Uploading data and using it on Arvados |
||
18 | |||
19 | Full documentation can be found "here":http://doc.arvados.org/user/tutorials/tutorial-keep.html |
||
20 | |||
21 | 7 | Bryan Cosca | # Install the "Arvados Python SDK":http://doc.arvados.org/sdk/python/sdk-python.html on the system from which you will upload the data (such as your workstation, or a server containing data from your sequencer). Doing so will install the Arvados file upload tool, arv-put. |
22 | # To configure the environment with the Arvados instance host name and authentication token, see "here":http://doc.arvados.org/user/reference/api-tokens.html |
||
23 | 2 | Bryan Cosca | # Navigate back to your Workbench dashboard and create a new project by clicking on the Projects dropdown menu and clicking Home. |
24 | 1 | Bryan Cosca | # Click on [+ Add a subproject]. Feel free to edit the Project name or description by clicking the pencil to the right of the text. |
25 | 14 | Jonathan Sheffi | # To add data, return to your shell, create a folder, and put the two paired-end fastq files you want to upload inside. Use the command arv-put * --project-uuid qr1hi-xxxxx-yyyyyyyyyyyyyyy. The qr1hi tag can be found in the url of your new project. This ensures that all the files you would like to upload are in one collection. |
26 | 1 | Bryan Cosca | # The output value xxxxxxxxxxxxxxxxxxxx+yyyy is the Arvados collection locator that uniquely describes this file. |
27 | # Once that is uploaded, navigate back to the dashboard and click on *Run a pipeline...* and choose lobstr v.3. |
||
28 | # You can change the input by clicking on [Choose] next to the *Input fastq collection ID*. |
||
29 | 14 | Jonathan Sheffi | # Click on the dropdown menu, click on your newly-created project, and choose your desired input collection. Click *OK* and *Run* to run lobSTR v.3 on your data! |
30 | 7 | Bryan Cosca | |
31 | 11 | Bryan Cosca | |
32 | 8 | Bryan Cosca | h3. FAQ |
33 | 9 | Bryan Cosca | |
34 | 1 | Bryan Cosca | * Does this support both paired-end and single-end reads? |
35 | 13 | Jonathan Sheffi | ** Currently, the pipeline template only supports paired-end reads. If you would like to run a single-end read experiment, please email support@curoverse.com and tell us about your project. You can also copy the template yourself and edit the commands! Documentation is provided "here":http://doc.arvados.org/user/index.html |
36 | 1 | Bryan Cosca | |
37 | 7 | Bryan Cosca | * What type of files does this support? |
38 | 14 | Jonathan Sheffi | ** It supports any FASTQ files with a variety of names as long as they contain the string "1.f" and "2.f". .fq, .fas, and .fastq are all supported. |
39 | 8 | Bryan Cosca | |
40 | 7 | Bryan Cosca | * Can this run multiple samples at once? |
41 | 13 | Jonathan Sheffi | ** We are currently working on supporting batch processing of multiple samples, and it will be ready soon. |