Index of files in collections » History » Version 1
Tom Clegg, 02/20/2019 06:55 PM
| 1 | 1 | Tom Clegg | h1. Index of files in collections |
|---|---|---|---|
| 2 | |||
| 3 | Currently the manifest_text column contains information about the individual files in collections. However, utility is limited because the data is not structured in a way that PostgreSQL understands. |
||
| 4 | * searching filenames is difficult/impossible because even the "list of filenames" column is too long for PostgreSQL to index properly. |
||
| 5 | * searching collections with a given block locator (or locator pattern, which is useful for partitioning keep-balance work) is inefficient. |
||
| 6 | |||
| 7 | These problems (and some other opportunities) could be addressed by keeping a separate table of files. |
||
| 8 | |||
| 9 | |pdh|dir|filename|bytesize|filehash†| |
||
| 10 | |abcd1234+123|foo/bar|baz.txt|1234|dcba4321| |
||
| 11 | |abcd1234+123|foo/bar|waz.txt|1235|efab8912| |
||
| 12 | |||
| 13 | † In general filehash cannot be computed just from the manifest. This column would presumably allow null ("not known") and might not exist at all. |
||
| 14 | |||
| 15 | New rows would be added to the files table whenever a collection is saved with a PDH that isn't already present. |
||
| 16 | |||
| 17 | Old rows would be deleted from the files table whenever the last remaining collection with a given PDH is removed. |
||
| 18 | |||
| 19 | Once this table is populated, searching collection filenames would be implemented by searching the files table and joining the collections table on PDH. |