Actions
Bug #12245
openpasta rotini-fastj does not handle tagset without ending newline
Start date:
09/13/2017
Due date:
% Done:
0%
Estimated time:
Story points:
-
Description
There is a bug in the pasta rotini-fastj conversion where if the tagset provided does not have a trailing newline, the last tag will be ignored.
For example, the following should work but doesn't:
pasta -action rotini-fastj \ -start 0 \ -tilepath 0000 \ -chrom chr1 \ -build hg19 \ -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \ -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \ -tag <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj
The culprit being the
samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24.
As a workaround, adding an extra newline will correct the issue:
pasta -action rotini-fastj \ -start 0 \ -tilepath 0000 \ -chrom chr1 \ -build hg19 \ -i stage/hu034DB1-GS00253-DNA_A02/0000.pa \ -assembly <( l7g assembly assembly.00.hg19.fw.gz 0000 ) \ -tag <( cat <( samtools faidx tagset.fa.gz 0000.00 | egrep -v '^>' | tr -d '\n' | fold -w 24 ) <( echo "" ) ) > stage/hu034DB1-GS00253-DNA_A02/0000.fj
The place to look is the readTag function in pasta_fastj.go and where the g.TagFinished flag is referenced but the details of what's wrong need to be investigated.
This bug specifically happens when there is a variant on the next to last tag in the stream being converted, as is the case for data set hu034DB1-GS00253-DNA_A02 in tilepath 0000.
Actions