Cgb » History » Version 3
Abram Connelly, 11/15/2016 04:43 PM
| 1 | 1 | Abram Connelly | h1. cgb |
|---|---|---|---|
| 2 | |||
| 3 | @cgb@ is a tool to help with access to the binary compact genome format (CGF). The tool is still in the prototyping stage. |
||
| 4 | |||
| 5 | 2 | Abram Connelly | Code for @cgb@ can be found on "github.com/abeconnelly/cgf":https://github.com/abeconnelly/cgf. |
| 6 | 1 | Abram Connelly | |
| 7 | h2. Quick start |
||
| 8 | |||
| 9 | <pre> |
||
| 10 | $ git clone https://github.com/abeconnelly/cgf |
||
| 11 | $ cd cgf/cpp |
||
| 12 | $ ./cmp.sh |
||
| 13 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -s 0 -B -k -p 862 |
||
| 14 | [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] |
||
| 15 | [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] |
||
| 16 | [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] |
||
| 17 | [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] |
||
| 18 | </pre> |
||
| 19 | |||
| 20 | h2. Brief overview |
||
| 21 | |||
| 22 | @cgb@ is meant to help debug and inspect CGF files. The two main features are to report the contents of a CGF in terms of tile variants and low quality information as well as to do some basic tile concordance operations. The code that @cgb@ uses is shared by the Lightning CGF server and is in part meant to test functionality used there. |
||
| 23 | |||
| 24 | 3 | Abram Connelly | --- |
| 25 | 1 | Abram Connelly | |
| 26 | 3 | Abram Connelly | h2. Concordance |
| 27 | |||
| 28 | 1 | Abram Connelly | The CGF has different 'tiers' of information, from a bit vector representing whether the tile is canonical, to a cache holding the first 8 tile variants to the overflow tables if the cache is exceeded. To test and for rough estimates, different 'levels' of concordance are used with @cgb@. |
| 29 | |||
| 30 | * Level 0 - compare canonical tiles only |
||
| 31 | * Level 1 - compare canonical tiles and cache |
||
| 32 | * Level 2 - a full tile concordance |
||
| 33 | |||
| 34 | h5. example |
||
| 35 | |||
| 36 | <pre> |
||
| 37 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 0 |
||
| 38 | level: 0, canonical match: 6491163 |
||
| 39 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 1 |
||
| 40 | level: 1, canonical+cache match: 6519788, loq: 148760 |
||
| 41 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -i data/hg19.cgf -l 2 |
||
| 42 | #match_tot: 6610685 |
||
| 43 | </pre> |
||
| 44 | |||
| 45 | 3 | Abram Connelly | --- |
| 46 | |||
| 47 | h2. CGF Inspection |
||
| 48 | 1 | Abram Connelly | |
| 49 | h5. JSON Tile Path information |
||
| 50 | |||
| 51 | Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information and print in JSON format. |
||
| 52 | |||
| 53 | <pre> |
||
| 54 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B |
||
| 55 | { |
||
| 56 | "035e":{ |
||
| 57 | "tilepath":862, |
||
| 58 | "start_tilestep":0, |
||
| 59 | "allele":[ |
||
| 60 | [ 79, 8, 0, 0, 0, 0, 0, -1, 0, 0, 0, 389, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 34, |
||
| 61 | -1, 185, 1 ], |
||
| 62 | [ 79, 2, 0, 0, 0, 0, 0, -1, 0, 0, 0, 390, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 26, 0, 0, 1, 0, 0, -1, 34, |
||
| 63 | -1, 185, 1 ] |
||
| 64 | ], |
||
| 65 | "loq_info":[ |
||
| 66 | [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], |
||
| 67 | [ ], [ ], [ 291, 2 ] ], |
||
| 68 | [ [ ], [ ], [ ], [ ], [ ], [ ], [ 903, 1 ], [ ], [ 16, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ 96, 1 ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], [ ], |
||
| 69 | [ ], [ ], [ 291, 2 ] ] |
||
| 70 | ] |
||
| 71 | } |
||
| 72 | } |
||
| 73 | </pre> |
||
| 74 | |||
| 75 | h5. Tile Path Compact Representation |
||
| 76 | |||
| 77 | Get tile path 862 (0x35e, which is chrM) starting at tile 0 including low quality information |
||
| 78 | |||
| 79 | <pre> |
||
| 80 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -p 862 -s 0 -B -L -k |
||
| 81 | [ 79 8 0 0 0 0 0 -1 0 0 0 389 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 -1 34 -1 185 1] |
||
| 82 | [ 79 2 0 0 0 0 0 -1 0 0 0 390 0 0 0 0 0 1 0 0 0 0 0 0 26 0 0 1 0 0 -1 34 -1 185 1] |
||
| 83 | [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] |
||
| 84 | [[ ][ ][ ][ ][ ][ ][ 903 1 ][ ][ 16 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 96 1 ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ 291 2 ]] |
||
| 85 | </pre> |
||
| 86 | |||
| 87 | h5. Inspect Binary File (Debug) |
||
| 88 | |||
| 89 | Get a debugging printout of the information in the CGF file |
||
| 90 | |||
| 91 | <pre> |
||
| 92 | $ ./cgb -i data/hu826751-GS03052-DNA_B01.cgf -D -i data/hg19.cgf |
||
| 93 | Magic: "cgf.b"{ (7b22622e66676322) |
||
| 94 | CGFVersion: 0.1.0 |
||
| 95 | LibVersion: 0.1.0 |
||
| 96 | PathCount: 863 |
||
| 97 | TileMapLength: 7044 |
||
| 98 | TileMap: |
||
| 99 | [[0+1],[0+1]], [[0+1],[1+1]], [[1+1],[0+1]], [[1+1],[1+1]], [[0+1],[2+1]], [[2+1],[0+1]], [[0+1,0+1],[1+2]], [[1+2],[0+1,0+1]], [[0+2],[0+2]], [[0+1],[3+1]], [[3+1],[0+1]], [[1+1,0+1],[0+2]], [[0+2],[1+1,0+1]], [[0+1],[4+1]], [[4+1],[0+1]], [[1+2],[1+2]], [[2+1],[2+1]], [[1+1],[3+1]], [[3+1],[1+1]], [[1+1],[2+1]], [[2+1],[1+1]], [[0+1],[5+1]], [[5+1],[0+1]], [[0+1],[6+1]], [[6+1],[0+1]], [[0+1,0+1],[2+2]], [[2+2],[0+1,0+1]], [[0+1,0+1],[3+2]], [[3+2],[0+1,0+1]], [[3+1],[3+1]], [[0+1],[7+1]], [[7+1],[0+1]], |
||
| 100 | ... |
||
| 101 | |||
| 102 | 035e.Loq.LoqFlagByteCount: 5 |
||
| 103 | 035e.Loq.LoqFlag[5]: |
||
| 104 | 40 01 20 00 04 |
||
| 105 | |||
| 106 | 035e.Loq.LoqInfoByteCount: 18 |
||
| 107 | 035e.Loq.LoqInfo[18]: |
||
| 108 | 01 02 83 87 01 01 02 10 01 01 02 60 01 01 02 81 23 02 |
||
| 109 | |||
| 110 | </pre> |