Story #14611
closed[Epic] Site-wide search for text, filenames, data
0%
Description
- Full-text search doesn't find exact strings (#13508) and doesn't index all filenames in large collections (#13752, #14560).
- Substring search is slow, and doesn't index full rows (this is why full-text search was added).
- No facility at all for searching file contents.
It is possible that we can use PostgreSQL's full-text search to address everything short of searching file contents, with a bit more work on our side (use a dictionary/language other than English, create a table of filenames instead of searching a huge text field with a list of filenames, etc.)
Another approach would be to use a separate tool to index/search the database, and apply Arvados permissions to those results. This could conceivably index file contents as well as database rows.
Updated by Tom Clegg about 6 years ago
- Related to Story #13508: Fix postgres search for filenames added
Updated by Tom Clegg about 6 years ago
- Related to Bug #14560: [1.3.0] error: ERROR: string is too long for tsvector (2299194 bytes, max 1048575 bytes) added
Updated by Tom Clegg about 6 years ago
- Related to Bug #6382: [Workbench] Searching through a collection using regex should accept $ instead of \n added
Updated by Peter Amstutz about 6 years ago
I like the idea of a hybrid solution that uses PG full text search for name/description etc fields and uses a specialized database for indexing collection contents, both filenames and contents of documents. We need to be careful we don't start storing reads from fastq files in the full text database though.
Updated by Tom Clegg almost 6 years ago
- Is duplicate of Feature #14573: [Spike] [API] Fully functional filename search added
Updated by Tom Clegg almost 6 years ago
- Status changed from New to Duplicate
- Target version deleted (
To Be Groomed)