Project

General

Profile

Actions

Task #11922

closed

Make Tagset CWL pipeline

Added by Abram Connelly over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Low
Assigned To:
-
Target version:
-

Description

Take code in https://github.com/curoverse/l7g/tree/master/tools/tagset and convert it to a CWL pipeline for automatic tagset creation.

This includes:

  • Create a FASTA file that has the tagset (what the referenced code does now)
  • Compress and index (both for samtools faidx access and bgzip access) the created FASTA tagset
  • Create a 2bit tagset

The original code to create the tagset was created using Go(lang) and has since been ported (in the above code referenced) to a combination of Python and C++.

Assuming all dependencies have been met, simply running createTagsetFa.sh should work to output the FASTA tagset.

Dependencies include:

All these files are available in Keep via the Project reference.

The original tagset creation had some issues including not always falling on a unique 24mer and some inconsistencies in choosing an end tile length and tag. The quirks of the original tagset creation have been encoded in the choose_tagset_startpos0_vestigial.py script. In the future something like choose_tagset_startpos0.py or another custom script could be used to choose a tagset.

Though outside of the scope of this work, it would be nice to come up with a CWL pipeline that somehow made custom tagsets easier to experiment with. Converting the current tagset creation into a CWL will hopefully be a good start.

Actions #1

Updated by Abram Connelly about 8 years ago

  • Status changed from New to Resolved
  • Remaining (hours) set to 0.0

This CWL has been created and named tagset .

Note that it needs the arvados/l7g image to run, as it needs scripts installed on that system to create the tagset.

Actions

Also available in: Atom PDF