cgb¶
cgb
is a tool to help with access to the binary compact genome format (CGF). The tool is still in the prototyping stage.
Code for cgb
can be found on github.com/abeconnelly/cgf.
Quick start¶
$ git clone https://github.com/abeconnelly/cgf $ cd cgf/cpp $ ./cmp.sh $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -s 0 -B -k -p 862 [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]]
Brief overview¶
cgb
is meant to help debug and inspect CGF files. The two main features are to report the contents of a CGF in terms of tile variants and low quality information as well as to do some basic tile concordance operations. The code that cgb
uses is shared by the Lightning CGF server and is in part meant to test functionality used there.
Concordance¶
The CGF has different 'tiers' of information, from a bit vector representing whether the tile is canonical, to a cache holding the first 8 tile variants to the overflow tables if the cache is exceeded. To test and for rough estimates, different 'levels' of concordance are used with cgb
.
- Level 0 - compare canonical tiles only
- Level 1 - compare canonical tiles and cache
- Level 2 - a full tile concordance
example¶
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 0 level: 0, canonical match: 6491163 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 1 level: 1, canonical+cache match: 6519788, loq: 148760 $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 2 #match_tot: 6610685
CGF Inspection¶
JSON Tile Path information¶
Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information and print in JSON format.
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B { "035e":{ "tilepath":862, "start_tilestep":0, "allele":[ [ 79, 8, 0, 0, 0, 0, 0, -1, 0, 0, 0, 389, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 34, -1, 185, 1 ], [ 79, 2, 0, 0, 0, 0, 0, -1, 0, 0, 0, 390, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 26, 0, 0, 1, 0, 0, -1, 34, -1, 185, 1 ] ], "loq_info":[ [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 291, 2 ] ], [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 291, 2 ] ] ] } }
Tile Path Compact Representation¶
Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B -L -k [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]]
Inspect Binary File (Debug)¶
Get a debugging printout of the information in the CGF file
$ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -D -i data/hg19.cgf Magic: "cgf.b"{ (7b22622e66676322) CGFVersion: 0.1.0 LibVersion: 0.1.0 PathCount: 863 TileMapLength: 7044 TileMap: [[0+1],[0+1]], [[0+1],[1+1]], [[1+1],[0+1]], [[1+1],[1+1]], [[0+1],[2+1]], [[2+1],[0+1]], [[0+1,0+1],[1+2]], [[1+2],[0+1,0+1]], [[0+2],[0+2]], [[0+1],[3+1]], [[3+1],[0+1]], [[1+1,0+1],[0+2]], [[0+2],[1+1,0+1]], [[0+1],[4+1]], [[4+1],[0+1]], [[1+2],[1+2]], [[2+1],[2+1]], [[1+1],[3+1]], [[3+1],[1+1]], [[1+1],[2+1]], [[2+1],[1+1]], [[0+1],[5+1]], [[5+1],[0+1]], [[0+1],[6+1]], [[6+1],[0+1]], [[0+1,0+1],[2+2]], [[2+2],[0+1,0+1]], [[0+1,0+1],[3+2]], [[3+2],[0+1,0+1]], [[3+1],[3+1]], [[0+1],[7+1]], [[7+1],[0+1]], ... 035e.Loq.LoqFlagByteCount: 5 035e.Loq.LoqFlag[5]: 40 01 20 00 04 035e.Loq.LoqInfoByteCount: 18 035e.Loq.LoqInfo[18]: 01 02 83 87 01 01 02 10 01 01 02 60 01 01 02 81 23 02
Updated by Abram Connelly about 8 years ago · 3 revisions