Actions
Feature #5202
closedHash individual files in collections.
Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-
Description
It would be very helpful to have hashes for the individual files in a collection available by default. This supports a couple of use cases:
- Checking if a file in Keep is the same as a local file (even if they have different file names) for validation, or to avoid redundant file uploads.
- Searching to see if a given file is present in Keep (for example, auditing to see if a file was uploaded that should not have been.)
- Identifying files which are stored multiple times in Keep with different block alignment (resulting in different blocks) which could be re-packed and the relevant collections updated to achieve deduplication).
Implementation proposal sketch here:
https://arvados.org/projects/arvados/wiki/Separating_files_from_collections
Actions