Task #11922
closedMake Tagset CWL pipeline
Description
Take code in https://github.com/curoverse/l7g/tree/master/tools/tagset and convert it to a CWL pipeline for automatic tagset creation.
This includes:
- Create a FASTA file that has the tagset (what the referenced code does now)
- Compress and index (both for samtools faidx access and bgzip access) the created FASTA tagset
- Create a 2bit tagset
The original code to create the tagset was created using Go(lang) and has since been ported (in the above code referenced) to a combination of Python and C++.
Assuming all dependencies have been met, simply running createTagsetFa.sh should work to output the FASTA tagset.
Dependencies include:
bigWigToBedGraph(available at http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/)hg19FASTA file (available at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ ?)cytoBand.txt(available at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/)wgEncodeCrgMapabilityAlign24mer.bigWig(available at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/)
All these files are available in Keep via the Project reference.
The original tagset creation had some issues including not always falling on a unique 24mer and some inconsistencies in choosing an end tile length and tag. The quirks of the original tagset creation have been encoded in the choose_tagset_startpos0_vestigial.py script. In the future something like choose_tagset_startpos0.py or another custom script could be used to choose a tagset.
Though outside of the scope of this work, it would be nice to come up with a CWL pipeline that somehow made custom tagsets easier to experiment with. Converting the current tagset creation into a CWL will hopefully be a good start.
Updated by Abram Connelly about 8 years ago
- Status changed from New to Resolved
- Remaining (hours) set to 0.0
This CWL has been created and named tagset .
Note that it needs the arvados/l7g image to run, as it needs scripts installed on that system to create the tagset.