Bug #23152
closedValidate schema of complex fields
Description
Original Report¶
External Credentials API can store and return a string in the "scopes" field but it should only allow an array of strings, as stated in the documentation: https://doc.arvados.org/main/api/methods/credentials.html. Currently, the External Credential create and update actions in Workbench (currently in development, #22820) only allows arrays to be submitted in the "scopes" field (strings passed in are converted to arrays, which may or may not be how it works in the final iteration). Users can create/update credentials outside of Workbench, so Workbench submission validation isn't enough. The method that renders the "scopes" field currently adapts for string values, but it shouldn't have to. Being that it's explicitly stated in the documentation, the API should be updated so that the "scopes" field only allows an array of strings.
Follow-up investigation¶
It seems like the fundamental issue is we do no validation on columns stored as JSONB, and this leads to wildly different handling. Testing on a collection:
>>> c2 = arv.collections().update(uuid=c1['uuid'], body={'collection': {'properties': 'foo'}}).execute()
[…]
arvados.errors.ApiError: <HttpError 500 when requesting https://pirca.arvadosapi.com/arvados/v1/collections/pirca-4zz18-pm4scdmd8iyu73o?alt=json returned "invalid character 'o' in literal false (expecting 'a')">
>>> c2 = arv.collections().update(uuid=c1['uuid'], body={'collection': {'properties': '"foo"'}}).execute()
[…]
arvados.errors.ApiError: <HttpError 500 when requesting https://pirca.arvadosapi.com/arvados/v1/collections/pirca-4zz18-pm4scdmd8iyu73o?alt=json returned "json: cannot unmarshal string into Go value of type map[string]interface {}">
These are coming from controller (note they don't mention the railsapi.internal URL). It sucks that they return 500 because this causes clients to retry even though the request will never succeed.
If your request goes through controller you can cause other problems:
>>> c2 = arv.collections().update(uuid=c1['uuid'], body={'collection': {'storage_classes_desired': 'foo', 'storage_classes_confirmed': 'bar'}}).execute()
[…]
arvados.errors.ApiError: <HttpError 422 when requesting https://pirca.arvadosapi.com/arvados/v1/collections/pirca-4zz18-pm4scdmd8iyu73o?alt=json returned "//railsapi.internal/arvados/v1/collections/pirca-4zz18-pm4scdmd8iyu73o: 422 Unprocessable Entity: #<NoMethodError: undefined method `each' for "foobar":String> (req-1yk3kj82rhbk1qlmkefi)">
>>> c2 = arv.collections().update(uuid=c1['uuid'], body={'collection': {'storage_classes_desired': {'foo': 2}, 'storage_classes_confirmed': {'bar': 1}}}).execute()
[…]
arvados.errors.ApiError: <HttpError 422 when requesting https://pirca.arvadosapi.com/arvados/v1/collections/pirca-4zz18-pm4scdmd8iyu73o?alt=json returned "//railsapi.internal/arvados/v1/collections/pirca-4zz18-pm4scdmd8iyu73o: 422 Unprocessable Entity: #<NoMethodError: undefined method `+' for {"foo"=>2}:Hash> (req-cnw91h03zgsuuuntbzp9)">
So basically we're not doing any validation anywhere, we've just avoided this problem in other cases because invalid values happen to cause a crash. If you could find a value with an unexpected type that implemented the methods RailsAPI expects you could probably cause some trouble!
Real incident¶
We have had one real incident where a user created a container request where output_storage_classes was a string, and now controller returns 500 in response to a request for a list of container requests, because it's trying and failing to serialize the JSON:
$ arv container_request create -o -
{
"name": "output_storage_classes string",
"output_storage_classes": "default",
…
}
Error: json: cannot unmarshal string into Go struct field ContainerRequest.output_storage_classes of type []string (req-1m1rztf4fykvd1mhlysc)
[But note it actually got created, controller just couldn't handle the response:]
$ arv container_request list
Error: json: cannot unmarshal string into Go struct field ContainerRequest.items.output_storage_classes of type []string (req-1ih8iknedg0kr1ncf3ma)
Solution¶
RailsAPI should validate all complex data types as deeply as possible on create and update, whether they're stored as JSONB or Rails' own serialization.
We should check that arrays are actually arrays and that all items are strings. (Conveniently, this is actually what we want as of Arvados 3.1.2/3.2.0. That might change in the future but then extending the validation is tomorrow's problem.)
For hashes we want to do as much as possible to confirm that incoming data is a hash that conforms to a schema.
mounts and secret_mounts¶
All keys should be absolute POSIX paths. All values should be hashes conforming to a static schema per the API documentation. e.g., kind is a string of one of the supported types, writable is a boolean, writable can only be true for certain kinds, etc.
published_ports¶
Each key is a numeric string representing an integer 1-65535 inclusive. Each value is a hash conforming to the documented static schema.
prefs, properties, output_properties¶
All keys are strings and all values are JSON-representable (e.g., hashes are also limited to string keys).
environment¶
All keys and values are strings. Keys may not contain = or \0 per environ(7).
runtime_constraints, runtime_status, scheduling_parameters¶
Static schema per the API documentation.
Updated by Lisa Knox 7 months ago
- Related to Feature #22820: Ability to create, edit and delete external credentials in workbench added
Updated by Lisa Knox 7 months ago
- Related to Feature #22680: store persistent credentials to external resources and provide them to a container added
Updated by Brett Smith 7 months ago
- Release set to 82
- Description updated (diff)
- Subject changed from External Credentials API can store and return a string in the "scopes" field to Validate type of JSONB columns
Updated by Brett Smith 6 months ago
Just checking what's out there, https://github.com/JamesBrooks/hash_validator
Updated by Brett Smith 6 months ago
- Related to Bug #23157: Controller should return a non-retryable error if RailsAPI returns valid JSON that cannot be deserialized added
Updated by Brett Smith 6 months ago
- Related to Bug #21302: Container created with corrupted mounts added
Updated by Tom Clegg 6 months ago
We have several attributes defined like this
services/api/app/models/credential.rb: attribute :scopes, :jsonbArray, default: []
which looks like it's going to validate the value as an array, but doesn't. All it does is convert a null value (from the database or a client) to an empty array.
Updated by Brett Smith 6 months ago
- Release changed from 82 to 79
- Target version set to Development 2025-10-01
- Assigned To set to Tom Clegg
- Description updated (diff)
Updated by Brett Smith 6 months ago
- Related to Idea #23159: Decide and implement a strategy for handling records with bad field data added
Updated by Brett Smith 6 months ago
- Subject changed from Validate type of JSONB columns to Validate schema of complex fields
Updated by Tom Clegg 6 months ago
Prior to this branch, we were not validating that mount targets were absolute paths. A mount point like "foo" would get passed through to docker or singularity, both of which reject non-absolute bind-mount targets, so the container would [be retried a couple of times and then] fail. Example:
"mounts": {
"json": {
"content": {"foo": "bar"},
"kind": "json"
},
ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:00.370773710Z using local keepstore process (pid 9580) at http://10.253.254.141:33967 ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.325593948Z gateway server listening at 10.253.254.141:38175 ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.325754321Z crunch-run 3.2.0~dev20250908235328 (go1.24.1) started ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.325981582Z crunch-run process has uid=0(root) gid=0(root) groups=0(root) ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.630980847Z Using FUSE mount: /usr/bin/arv-mount 3.2.0.dev20250804004422 ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.637838307Z Using container runtime: docker Engine 28.3.3, containerd 1.7.27, runc 1.2.5, docker-init 0.19.0 ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.637871687Z Executing container: tordo-dz642-byuwa85uc3i157n ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.637884569Z Executing on host 'ip-10-253-254-141' ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.719152753Z container token "v2/tordo-gj3su-rgmrrnfhke3gmxs/4wwc5ntncih4tt6ohiipydpuwxipoftzls1m1qhldj62ojogzx/tordo-dz642-byuwa85uc3i157n" ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:04.719912924Z Running [arv-mount --foreground --read-write --storage-classes default --crunchstat-interval=10 --allow-other --disk-cache --disk-cache-dir /tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/keepcache4078625566 --file-cache 2147483648 --mount-by-pdh by_id --disable-event-listening --mount-by-id by_uuid /tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/keep1635919792] ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.124609084Z Fetching Docker image from collection 'dc86826012fbcc3f357b3c7a3498d48b+175' ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.185148611Z Using Docker image id "sha256:e1a7bb630c8baa947c5430a7f0965ef6afe71e88e90547aced2e601a89b68399" ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.185201069Z Loading Docker image from keep ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.186316430Z Creating Docker container ip-10-253-254-141 dockerd[873]: time="2025-09-16T18:20:05.220900460Z" level=error msg="Handler for POST /v1.35/containers/create returned error: invalid volume specification: '/tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/json723631839/mountdata.json:json:ro': invalid mount config for type \"bind\": invalid mount path: 'json' mount path must be absolute" ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.222045533Z error in Run: While creating container: Error response from daemon: invalid volume specification: '/tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/json723631839/mountdata.json:json:ro': invalid mount config for type "bind": invalid mount path: 'json' mount path must be absolute ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.260385895Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/keep1635919792] ip-10-253-254-141 crunch-run[9572]: 2025-09-16T18:20:05.600074326Z fusermount: failed to unmount /tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/keep1635919792: Invalid argument ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n 2025-09-16T18:20:05.682717003Z crunch-run finished ip-10-253-254-141 crunch-run[9572]: tordo-dz642-byuwa85uc3i157n: While creating container: Error response from daemon: invalid volume specification: '/tmp/crunch-run.tordo-dz642-byuwa85uc3i157n.1189314572/json723631839/mountdata.json:json:ro': invalid mount config for type "bind": invalid mount path: 'json' mount path must be absolute
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-17:
Prior to this branch, we were not validating that mount targets were absolute paths. A mount point like "foo" would get passed through to docker or singularity, both of which reject non-absolute bind-mount targets, so the container would [be retried a couple of times and then] fail.
Cool, this is exactly the kind of usability win I was hoping to get from this development. Refusing the request because it will fail >>> failing at dispatch time, potentially hours later, with a much-harder-to-understand error message.
Updated by Tom Clegg 6 months ago
23152-validate-serialized @ df8d13493725284a17f82b77184c84b1ea948478 -- developer-run-tests: #4887
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅ Check data types of serialized attributes (e.g., storage_classes_desired must be an array of non-empty strings)
- ✅ Check schema of mounts, secret_mounts, published_ports, runtime_constraints, scheduling_parameters, runtime_status
- ✅ Reject environment variable names with NUL or =, environment variable values with NUL
- ✨ Refactor hash/array type checks so they report as validation failures (which mention the relevant attribute name) rather than serialization errors (which don't)
- Content of mounts etc is (still) validated only for new and committed container requests (not containers) -- in practice the container attributes are copied from container requests by system components, so I'm thinking this approach will just avoid problems loading existing records that were stored before the stricter validations were in place.
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- ✅ Automated tests are updated with more invalid-content cases
- ✅ Container request tests now check the actual validation error messages. Now, if (say) a new required field is added, the existing cases will still effectively check the validations they were intended to check, rather than passing merely because something didn't validate.
- The tested code incorporates recent main branch changes.
- ✅
- New or changed UI/UX has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- ✅ Added previously undocumented
"kind": "text"mount. - ✅ Fixed a couple of formatting issues.
- ✅ Added previously undocumented
- Behaves appropriately at the intended scale (describe intended scale).
- ✅ No scale issues anticipated.
- Considered backwards and forwards compatibility issues between client and server.
- ✅ If a future client tries to use a new mount type that we don't support, it will be rejected, which is what we want.
- ✅ If an old client tries to use a field that is no longer supported, it will be rejected, which is what we want.
- ✅ If an old client tries to use a misspelled field that used to be ignored, it will be rejected, which is what we want.
- ✅ If an old client uses a deprecated field that we decided to still accept (e.g., "cuda" as the old spelling of "gpu"), that works.
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-21:
23152-validate-serialized @ df8d13493725284a17f82b77184c84b1ea948478 -- developer-run-tests: #4887
This was a huge effort, thank you for taking it so seriously. From a troubleshooting perspective, it's going to be nice to be able to rule out bad data going forward. If an ounce of prevention is worth a pound of cure, then a pound of prevention is worth a ton of cure, right?
I pushed a couple very minor copyedits. Feel free to force-push them out if they offend you.
Given how many test changes there are, I'm running arvados-cwl-conformance-tests: #2029 as an additional check. We haven't run this in a while so there might be unrelated failures. We should investigate either way, but it can be a separate ticket if it doesn't look obviously related.
- It looks like User.prefs still needs validation.
- In
services/api/lib/serializers.rb, the changes toself.loadfeel like something that would be better done later as part of #23159. Can you please explain why this is safe to do now?
- The documentation for
output_storage_classessays “Default is["default"].” - Isn't the default every configured storage class withDefault: true? If it isn't, should it be? (If yes, that's definitely a separate ticket, don't fix it here.)
- In
lib/controller/federation_test.goin the JSON strings, the lines you added don't line up with the rest of the JSON on my system due to inconsistent whitespace. Please fix, I'm not fussed about what the solution is.
Thanks.
Updated by Tom Clegg 6 months ago
- It looks like User.prefs still needs validation.
Right. Added.
- In
services/api/lib/serializers.rb, the changes toself.loadfeel like something that would be better done later as part of #23159. Can you please explain why this is safe to do now?
Ugh. self.load gets called on database content and on client-provided attrs before validation. So in order to change the error messages from "invalid serialized data" to a validation error that mentions which field is affected, we need self.load to succeed.
I've added a separate after_find hook to check types on values loaded from the database, and some tests for it. This reverts us to "fail if database content is bad" in a way that #23159 can adjust (without also affecting how we deal with client-supplied content).
- The documentation for
output_storage_classessays “Default is["default"].” - Isn't the default every configured storage class withDefault: true? If it isn't, should it be? (If yes, that's definitely a separate ticket, don't fix it here.)
It is. Fixed documentation.
- In
lib/controller/federation_test.goin the JSON strings, the lines you added don't line up with the rest of the JSON on my system due to inconsistent whitespace. Please fix, I'm not fussed about what the solution is.
Fixed.
23152-validate-serialized @ 85dead51e9b7b8b2eac927442463cf9510054d6e -- developer-run-tests: #4889
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-23:
self.loadgets called on database content and on client-provided attrs before validation. So in order to change the error messages from "invalid serialized data" to a validation error that mentions which field is affected, we need self.load to succeed.
For the record, this is a good reason, and I might've been willing to take the branch as-is given this. We could've had a conversation about it. But, since you wrote the extra code, sure, I'll take that too.
23152-validate-serialized @ 85dead51e9b7b8b2eac927442463cf9510054d6e -- developer-run-tests: #4889
LGTM, thanks. (One test failure is #22997 and known, the other looks like a Workbench test race condition, it's hard to imagine it's related to the branch.)
Updated by Tom Clegg 6 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|30e250e82f2d98632cfe4f7e33da446f751544c5.