Bug #18000
Updated by Ward Vandewege over 3 years ago
<pre>
ward@shell:~$ arv collection list --order 'file_size_total desc' --limit 3 | jq -r '.items[] | [.portable_data_hash,.uuid] |@csv' |sed -e 's/"//g'|tr '\n' ' ' |xargs arvados-client deduplication-report
Collection _____-_____-_______________: pdh ________________________________+5003343; nominal size 7382073267640 (6.7 TiB); file count 2796
Collection _____-_____-_______________: pdh ________________________________+4961919; nominal size 6989909625775 (6.4 TiB); file count 5592
Collection _____-_____-_______________: pdh ________________________________+2103205; nominal size 2795436541525 (2.5 TiB); file count 3028
Collections: 3
Nominal size of stored data: 17167419434940 bytes (16 TiB)
Actual size of stored data: 17170607344506 bytes (16 TiB)
Saved by Keep deduplication: -3187909566 bytes (16 EiB)
</pre>
The actual size is calculated from the sum of the size of the blocks used between all collections. I assume the bug is caused by the fact that this calculation does not take into account that blocks can be used only partially in a manifest.