Feature #17944
closedadd vocabulary validation to controller
100%
Description
Cf. https://doc.arvados.org/admin/workbench2-vocabulary.html
- When an Arvados object that has properties (collections, container_requests, groups, links) is created or updated, the API server will validate the properties contents. Properties are key-value pairs (property definitions are called “tags” in the vocabulary file)
- Property keys are checked against the standardized key identifiers defined in the vocabulary file. The key is also checked against the aliases (labels) for each tag. If a property key matches one of the aliases, the API server returns an error indicating that the client is required to use the standardized identifier for the key.
- The property value is checked that it is in the range of values for the tag as defined in the vocabulary file.
When “strict” is true, the value must be one of the standardized value identifiers listed for that tag. If it is not a standardized value identifier, the API server returns an error. It does not accept aliases, but if the provided value matches an alias, the error message should indicate as such. - When “strict” is false or undefined, the value must either be one of the standardized values listed for that tag, or it must be a value that is not listed in aliases. If the value is listed in aliases, it should return an error that the client is required to use the standardized identifier.
- When a value is rejected due to use of an alias and not the standardized value identifier, the error message should include what standardized value identifier was expected.
- Use case insensitive match to check if a key or value matches an alias
- Respect the value of "strict_tags" in the vocabulary file for unknown property keys, can specify either:
- strict_tags: false -- Property keys which are not defined in the vocabulary are not checked
- strict_tags: true -- Property keys which are not defined in the vocabulary are rejected
- Property validation is applied to all users, including admins
- The configuration file will be stored somewhere on the filesystem of the host that runs Arvados controller. The controller will have an API endpoint that Workbench 2 or other applications can use to fetch the vocabulary file.
- If a vocabulary file is configured but cannot be read at startup, Arvados controller will fail with an error.
- If the same alias is associated with more than one standardized identifier, fail with an error.
- The config-check subcommand will detect and report configuration and vocabulary file errors.
- To ease migration, if a record is updated but the update does not change the properties, it should not reject the update of unrelated fields even if the current properties are invalid
- When strict_tags is enabled, need to recognize and special case properties already in use by Arvados tools. Some properties (list is likely incomplete?)
- type
- template_uuid
- groups
- username
- image_timestamp
- docker-image-repo-tag
- filters
- container_request
- Other configured managed properties
Also: arvados-cwl-runner has a 'cache http download' feature that notes the provenance by setting the source URL as the key that maps to an object containing the cache headers. This usage is incompatible with "strict_tag".
Implementation:
- Validation happens in controller for create and update calls
- Add config parameter to API/VocabularyPath, expected to be local to the machine the controller runs on.
- The vocabulary file will be loaded and cached by controller; file timestamp will be checked on any request. If the vocabulary file can't be read (e.g. permissions, invalid json, etc), the existing cached version will be used and a health warning/prometheus alert should be raised.
- If the file can't be read on startup, that's an error. config-check should also check this, and will need to take into account that this is only an error if the context is the controller.
- Apply validation before all save/update requests. Admin users do not get special treatment.
- The validation code should handle existing data gracefully: if a record has existing properties are invalid, but the update does not include properties, updates to other fields in the collection should still be permitted.
- Update wb2 to get the file from controller (it will need to export the cache copy as a valid URL to the JSON)
Updated by Ward Vandewege over 3 years ago
- Blocks Story #17454: Vocabulary checking of properties by API server/controller added
Updated by Ward Vandewege over 3 years ago
- Target version set to 2021-08-18 sprint
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
2021-08-18 sprint)
Updated by Peter Amstutz over 3 years ago
- Project changed from Arvados Epics to Arvados
- Target version set to 2021-09-01 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-15 sprint to 2021-09-29 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-29 sprint to 2021-10-13 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Updated by Lucas Di Pentima about 3 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima about 3 years ago
- Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Updated by Lucas Di Pentima about 3 years ago
Updates at 3849d76f2 - branch 17944-backend-vocabulary-validation
Test run: developer-run-tests: #2766
Updates¶
- Adds
API.VocabularyPath
config knob used bycontroller
. - Updates the documentation.
- Adds
arvados.Vocabulary
type with loading validation property check func & tests. - Vocabulary checking supports exceptional cases, including system & managed properties.
- Adds
/arvados/v1/vocabulary
endpoint to allow clients to request the vocabulary. - Adds vocabulary checks to create/update calls of collections, groups & container requests.
- Sets up a file watcher to reload the vocabulary file on demand, only replacing the previously cached vocabulary on success.
- Adds a health check on boot so that
controller
won't start with a non-valid vocabulary.
Pending¶
- Expose the new endpoint as a
Workbench.VocabularyURL
exported config so that older workbenches automatically migrate to the correct vocabulary. - Update wb2 to request the vocabulary from the new endpoint.
- Add vocabulary file reloading test -- I tried making it a unit test (part of
lib/controller/handler_test.go
), but the file watcher didn't seem to work, probably will need making it an integration test. - Add
arvados#link
resources to do vocabulary checking. - Notify of invalid vocabulary via Prometheus alert.
Notes¶
The commit history is messy because I changed the approach to vocabulary loading and didn't know how to merge it with the rest properly. Commit 149285d changes the way controller
loads and cache the vocabulary.
Updated by Lucas Di Pentima about 3 years ago
Arvados repo¶
Rebased & squashed into latest main (new branch): 1cd689f - branch 17944-backend-vocabulary-validation-rebased
Test run: developer-run-tests: #2770
Workbench2 repo¶
Updates at arvados-workbench2|ed768b6 - branch 17944-vocabulary-endpoint-retrieval
Test run: developer-tests-workbench2: #506
- Retrieves the vocabulary from the new endpoint.
- Fixes the Cypress tests by configuring the testing Arvados instance with an example vocabulary.
Updated by Lucas Di Pentima about 3 years ago
Updates at 3f32ceb - branch 17944-backend-vocabulary-validation-rebased
Test run: developer-run-tests: #2773
- Adds support for property checking to links.
Updated by Lucas Di Pentima about 3 years ago
Updates at 7b7de0b - branch 17944-backend-vocabulary-validation-rebased
Test run: developer-run-tests: #2775
- Adds
/_health/vocabulary
health endpoint - Improves vocabulary reloading by replacing the
fsnotify
method with a periodic modify time on the file.
Updated by Peter Amstutz about 3 years ago
reviewing 17944-backend-vocabulary-validation-rebased @ 7b7de0ba345c02103bbaa9fb981424c59d440d55
I found a bug:¶
$ arv group update -u x2z00-j7d0g-3lryy66m7p8s9dn --group '{"properties":{"IDTAGSIZES": "blah blah"}}' Error: tag value "blah blah" for key "IDTAGSIZES" is not listed as valid peter@curiipeter:[pts/2]:~ $ arv group update -u x2z00-j7d0g-3lryy66m7p8s9dn --group '{"properties":{"IDTAGCATEGORIES": "IDTAGCAT3", "IDTAGSIZES": "blah blah"}}' { "created_at":"2021-11-05T20:10:25.544636000Z", "delete_at":null, "description":null, "etag":"bejp1nrti83yrbylhe5a9dqf3", "group_class":"project", "href":"/groups/x2z00-j7d0g-3lryy66m7p8s9dn", "is_trashed":false, "kind":"arvados#group", "modified_at":"2021-11-08T18:53:51.031010000Z", "modified_by_client_uuid":"x2z00-ozdt8-8mg1eis92gnjrk6", "modified_by_user_uuid":"x2z00-tpzed-9r15h8tfsbqjc8l", "name":"humans", "owner_uuid":"x2z00-tpzed-9r15h8tfsbqjc8l", "properties":{ "IDTAGCATEGORIES":"IDTAGCAT3", "IDTAGSIZES":"blah blah" }, "trash_at":null, "uuid":"x2z00-j7d0g-3lryy66m7p8s9dn", "writable_by":[ "x2z00-tpzed-9r15h8tfsbqjc8l", "x2z00-tpzed-9r15h8tfsbqjc8l" ] }
It seems that when one property is valid it doesn't error out on the other one.
error messages¶
tag value "IDVALANIMALS2" for key "IDTAGCATEGORIES" is not listed as valid
Can we rephrase that:
tag value "IDVALANIMALS2" is not valid for key "IDTAGCATEGORIES"
Also:
tag value "XS" for key "IDTAGSIZES" is not defined but is an alias for "IDVALSIZES1"
This is pretty good but it doesn't quite tell the user what they are supposed to do, maybe something like this?
tag value "XS" for key "IDTAGSIZES" is an alias, must be provided as "IDVALSIZES1"
Another suggestion:
tag key "IDTAGSIZESz" is not defined
This is a little vague (not defined where?), let's make the messages a little more specific to say "defined in the vocabulary":
tag key "IDTAGSIZESz" is not defined in the vocabulary
Also there seems to be a formatting error here
$ arv link update -u x2z00-o0j2j-4s1hh8z52bbunh3 --link '{"properties":{"IDTAGSIZES": null}}' Error: tag value %!!(MISSING)q(<nil>) for key "IDTAGSIZES" is not a valid type (<nil>) peter@curiipeter:[pts/2]:~ $ arv link update -u x2z00-o0j2j-4s1hh8z52bbunh3 --link '{"properties":{"IDTAGSIZES": 12}}' Error: tag value %!!(MISSING)q(float64=12) for key "IDTAGSIZES" is not a valid type (float64)
Validation¶
Is it checking that you don't have aliases that conflict with vocabulary values? I see tests for duplicated aliases, but not conflicts between terms an aliases.
It looks like you do case-insensitive checks for conflicts with aliases, but the actual property checking is case sensitive. Could we also consider the ToLower() version of identifier an alias, so this would give an '"idvalsizes1" is an alias of "IDVALSIZES1"' error, instead of an invalid error:
$ arv link update -u x2z00-o0j2j-4s1hh8z52bbunh3 --link '{"properties":{"IDTAGSIZES": "idvalsizes1" }}' Error: tag value "idvalsizes1" for key "IDTAGSIZES" is not listed as valid
hot reloading¶
It should log an info message when successfully hot reloading (unless it does, and I missed it). I couldn't tell if it reloaded the vocabulary file or not.
Updated by Lucas Di Pentima about 3 years ago
Updates at e7aec8c18 addressing Peter's feedback.
Test run: developer-run-tests: #2781
- Fixes premature vocabulary check success.
- Improves error messages.
- Improves keys & value collision validation against aliases.
- Adds a case-insensitive check for key/value against labels.
- Fixes logging when reloading vocabulary
- Adds & improves tests.
Regarding the logging message on vocabulary reload: you should see it when making a request that needs the vocabulary (voc export endpoint, or create/update request)
Updated by Lucas Di Pentima about 3 years ago
Updates at c51e85a53
Test run: developer-run-tests: #2782
- Test fixing.
Updated by Peter Amstutz about 3 years ago
I noticed this:
"HIGH": { - "labels": [{"label": "High"}] + "labels": [{"label": "High priority"}] },
Is this because now "HIGH" and "High" are in conflict? It should probably allow aliases that only differ by case from the identifier they are associated with. There's no confusion in this case.
This error message is not as messed up as before, but it still expresses something is wrong without saying what to do about it:
$ arv group update -u x2z00-j7d0g-3lryy66m7p8s9dn --group '{"properties":{"IDTAGCATEGORIES": 12}}'
Error: tag value of type float64 for key "IDTAGCATEGORIES" is not a valid
How about:
Error: value type for tag key "IDTAGCATEGORIES" was float64, but expected a string or list of strings
Updated by Lucas Di Pentima about 3 years ago
- Target version changed from 2021-11-10 sprint to 2021-11-24 sprint
Updated by Peter Amstutz about 3 years ago
From sprint review:
- in Workbench 2, if you type in a name that is an alias for one of the vocabulary tags and then hit "tab" without selecting it in the dropdown, it'll use the raw text as the key instead of mapping it to the vocabulary. This will give the user a "can't use alias" error. When the user clicks/tabs off the text box, it should still convert it to the vocabulary id.
Updated by Lucas Di Pentima about 3 years ago
Updates at 617d78398
Test run: developer-run-tests: #2784
Peter Amstutz wrote:
I noticed this:
[...]
Is this because now "HIGH" and "High" are in conflict? It should probably allow aliases that only differ by case from the identifier they are associated with. There's no confusion in this case.
Good point. I've fixed que validation code to only consider value aliases collisions from different value ids.
This error message is not as messed up as before, but it still expresses something is wrong without saying what to do about it:
[...]
How about:
Error: value type for tag key "IDTAGCATEGORIES" was float64, but expected a string or list of strings
Thanks. Fixed & added some more tests.
Updated by Lucas Di Pentima about 3 years ago
Updates at 54d36a634 adds upgrade notes and a reference to the vocabulary endpoint.
Updated by Lucas Di Pentima about 3 years ago
Updates at arvados-workbench2|c4cc8cb0
Test run: developer-tests-workbench2: #508
- Adds property key and value auto-selection with case-insensitive matching.
- Expands test.
Updated by Peter Amstutz about 3 years ago
Lucas Di Pentima wrote:
Updates at 54d36a634 adds upgrade notes and a reference to the vocabulary endpoint.
One note from gitter, these specific test changes can be reverted, since "High" is no longer considered a conflict in this case:
- "labels": [{"label": "High"}] + "labels": [{"label": "High priority"}]
The rest LGTM!
Lucas Di Pentima wrote:
Updates at arvados-workbench2|c4cc8cb0
Test run: developer-tests-workbench2: #508
- Adds property key and value auto-selection with case-insensitive matching.
- Expands test.
This LGTM.
Updated by Lucas Di Pentima about 3 years ago
Updates at e60cae2f8
Test run: developer-run-tests: #2785
- Reverted changes in tests at c51e85a5
Updated by Lucas Di Pentima about 3 years ago
f1b121ccb - Removed a link to the 2.2 doc site on the upgrading notes so that tests pass.
Test run: developer-run-tests-doc-and-sdk-R: #927
Updated by Lucas Di Pentima about 3 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados-workbench-2:arvados-workbench2|eabdb7bdd468b09c633ddb8b33fd8095ad27bb60.