Bug #8873
open[Docs] file_names Collection field is undocumented
0%
Description
I'd like to list collections containing files that match a certain pattern. I thought, though expensive, it should be possible to do this by filtering on manifest_text.
Unfortunately this gives an error:
$ arv collection list -f '[["manifest_text","like","%._cfb9b6873.%vcf.gz"]]' -s '["uuid"]' Error: #<ArgumentError: Invalid attribute 'manifest_text' in filter>
Updated by Brett Smith over 8 years ago
Josh,
manifest_text was intentionally made unsearchable in #4523, because it's too big to index in the database. For use cases like yours, we provide a file_names
attribute that simply lists the filenames in the manifest, a long string with one filename per line. You should be able to do something like:
arv collection list -f '[["file_names","like","%._cfb9b6873.%vcf.gz\n%"]]' -s '["uuid"]'
Does that meet your needs?
I see that this attribute isn't in our API documentation at all, so if nothing else, this bug can tell us to fix that.
Updated by Joshua Randall over 8 years ago
- Category changed from API to Documentation
Thanks - the undocumented searchable "file_names" attribute meets my needs in this case. The "fix" for this issue would be to document it. It could also be good if it was part of the returned collection record, so that reading the out-of-band documentation is not required.
Are these file_names the filename only, or the full "path" (i.e. stream name and file name) to the file?
If they were actually path_names, allowing them to be selected as an output would make it possible to grab a collection's directory listing from the API server without having to parse the manifest.
Updated by Brett Smith over 8 years ago
Joshua Randall wrote:
Are these file_names the filename only, or the full "path" (i.e. stream name and file name) to the file?
Filename only. The column is also size-limited (to ensure it can be indexed), so it's not guaranteed to be a complete listing. I think this is part of the rationale for not returning it in individual GET requests: code that relied on it would mishandle large collections. Better to parse the manifest, which you'll always have.
Updated by Brett Smith over 8 years ago
- Subject changed from Can't filter collections on their contents to [Docs] file_names Collection field is undocumented
Updated by Brett Smith over 8 years ago
- Target version set to Arvados Future Sprints
Updated by Ward Vandewege over 3 years ago
- Target version deleted (
Arvados Future Sprints)