Feature #22680
closedstore persistent credentials to external resources and provide them to a container
Description
See https://dev.arvados.org/projects/arvados/wiki/Credential_storage
CWL support¶
I proposed this feature last year. It hasn't been implemented yet, but obviously supporting this in CWL requires appropriate Arvados support. The model is that a secret has a non-sensitive id part and a sensitive secret part.
https://github.com/common-workflow-language/cwl-v1.3/pull/26
Updated by Peter Amstutz about 1 year ago
- Position changed from -940334 to -940330
Updated by Peter Amstutz 12 months ago
- Related to Feature #20650: a-c-r natively supports S3 inputs just like HTTP/S added
Updated by Peter Amstutz 11 months ago
- Category changed from Crunch to API
- Description updated (diff)
- Subject changed from Arvados manages persistent credentials to external resources and can provide them to a container to store persistent credentials to external resources and provide them to a container
- Tracker changed from Idea to Feature
Updated by Peter Amstutz 11 months ago
22680-credentials-storage @ 7acac293ec3cd514daa9e41d9aa5a24670fa546c
Updated by Peter Amstutz 11 months ago
- Blocks Feature #22819: arvados-cwl-runner can use new credential API to get AWS credentials to use to download from S3 added
Updated by Peter Amstutz 11 months ago
- Blocks Feature #22820: Ability to create, edit and delete external credentials in workbench added
Updated by Peter Amstutz 11 months ago
22680-credentials-storage @ 28bddd1db0fce1807bded7176a6e724f11c22a6f
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Added credential table
- Added controller
- Added tests
- Added to Go SDK
- Decided to include the
credential_scopesfield (which we had cut during the last engineering discussion a fit of simplification) because I still have plans to use it, it is relatively harmless to include, and way easier to include now then having to do a migration later
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- passes tests, and confirmed that the API can be used through controller
- New or changed UX/UX and has gotten feedback from stakeholders.
- was discussed at engineering meeting
- Documentation has been updated.
- yes, added new page for new API endpoint
- Behaves appropriately at the intended scale (describe intended scale).
- I don't expect there to be massive numbers of credential fields so I did not add any indexes besides the ones that we've written automated tests to check for
- Considered backwards and forwards compatibility issues between client and server.
- this is an entirely new feature so there are no backwards compatability concerns. the API revision has been updated so tha clients can check for the feature.
- Follows our coding standards and GUI style guidelines.
- yes
I guess it is too bad that in 2025 we're still adding new Rails models, but on the plus side because of all the infrastructure and my familiarity with the the API service, I was able to do the whole thing in a couple of days.
What are our current best practices around periodic database maintenance tasks? I'd like to run a query like UPDATE credentials SET credential_secret = '' WHERE expires_at < statement_timestamp() every 15 minutes or every hour or whatever. That's the one gap I'd like to fix in this branch.
Updated by Peter Amstutz 11 months ago
18:55:23 FAIL: fs_collection_test.go:2032: CollectionFSSuite.TestRepackCost_SourceTree_Part
18:55:23
18:55:23 fs_collection_test.go:2033:
18:55:23 s.testRepackCost(c, dataToWrite_SourceTree(c, 500), 40)
18:55:23 fs_collection_test.go:2135:
18:55:23 c.Check(blocksInManifest() <= maxBlocks, check.Equals, true, check.Commentf("expect %d <= %d", blocksInManifest(), maxBlocks))
18:55:23 ... obtained bool = false
18:55:23 ... expected bool = true
18:55:23 ... expect 53 <= 40
18:55:23
18:55:23 writes 500 files 500 bytesContent 17024342 bytesWritten 26912026 bytesRewritten 9887684 blocksInManifest 53
18:55:23 spent 155.608536ms on 500 Repack calls, average 311.217µs per call
18:55:23 spent 121.099506ms on 480 Repack calls that had no effect, average 252.29µs per call
18:55:23 OOPS: 129 passed, 7 skipped, 1 FAILED
18:55:23 --- FAIL: Test (33.56s)
Is there some way a change in the API server or discovery document could have invalidated an assumption of this test?
Updated by Tom Clegg 11 months ago
Peter Amstutz wrote in #note-24:
What are our current best practices around periodic database maintenance tasks? I'd like to run a query like
UPDATE credentials SET credential_secret = '' WHERE expires_at < statement_timestamp()every 15 minutes or every hour or whatever. That's the one gap I'd like to fix in this branch.
SysController#trash_sweep is probably the place. It currently deals with trashed collections/groups, expired tokens, unused uuid_locks.
Updated by Peter Amstutz 11 months ago
22680-credentials-storage @ 02d4a3da728a8e562e3c96d386ba71f55aecdccc
Updated by Tom Clegg 11 months ago
Interface:
The stutteringcredential_ prefix seems unfortunate.
- credential_class mirrors link_class, and avoids colliding with language use of "class", fine
- credential_id avoids colliding with everyone's "database column called id is a numeric primary key" assumption, although it still collides with common usage "credential_id is a foreign key referencing credentials.id"
- credential_secret could be just "secret", except that the prefix makes it pair well with credential_id
- credential_scopes -- should be just scopes? (it's not a third part of the id/secret key pair, and doesn't collide with anything)
What is the use case for showing the credential_id ("access key ID" in aws-speak) to non-container clients?
The id/secret names align well with AWS keys, but I wonder if the general case would be better served by a single "secret" field whose schema is determined by "credential_class" -- e.g., ifcredential_class = "aws_access_key" then secret = {"access_key_id":"...","secret_access_key":"..."} ...?
- The most obvious (to me) 2nd use case is an OAuth API key, which is just a single opaque string that would go in
credential_secret-- thecredential_idcolumn would be logged, but otherwise unused.
If we do stick with separate columns, I think it would be better for the "get secret" API to return both parts of the credential (id+secret), so the caller can reliably get a matching set, even if credentials are updated/rotated.
"cannot be set to expire further in the future unless credential_secret is being changed at the same time" seems like an awkward rule. This means every process that retrieves credentials from a third party and updates the Arvados credential record will need to- have some assurance that the third-party credential provider will never extend the expiry time of an existing credential, and always use the third-party expiry time instead of a shorter one; or
- proactively (or in response to a "can't extend expiry time" error) work around this by saving a bogus secret without changing expiry, then saving the real secret with the new expiry time -- and hope no container gets unlucky and accesses the bogus secret
Given that the workaround exists, doesn't this restriction amount to an unnecessary inconvenience? Maybe I'm missing what it's intended to accomplish/prevent.
Updated by Peter Amstutz 11 months ago
Tom Clegg wrote in #note-30:
Interface:
- credential_scopes -- should be just scopes? (it's not a third part of the id/secret key pair, and doesn't collide with anything)
It could also be called "resources" since we use "scopes" to mean something else in another table. WDYT? I don't feel strongly about it.
What is the use case for showing the credential_id ("access key ID" in aws-speak) to non-container clients?
The id/secret names align well with AWS keys, but I wonder if the general case would be better served by a single "secret" field whose schema is determined by "credential_class" -- e.g., ifcredential_class = "aws_access_key"thensecret = {"access_key_id":"...","secret_access_key":"..."}...?
- The most obvious (to me) 2nd use case is an OAuth API key, which is just a single opaque string that would go in
credential_secret-- thecredential_idcolumn would be logged, but otherwise unused.
This is used to identify the credential if you want to update/delete it later (e.g. in Workbench). Another example that would go into credential_id would be the username part of a username/password pair. Arvados API tokens have a similar split of the token UUID (not secret) and a secret portion, although we typically smoosh them togther.
Sometimes the credential_id could be blank, yes, it depends on the semantics of a particular credential_class. But this pattern of having a public part and a private part seems common (seeing as how even we do it!) and useful to capture.
I think the strongest argument is that sometimes I might to need to update or delete a credential and I don't remember which of the credentials named "AWS_TOKEN1", "AWS_TOKEN2", or "AWS_TOKEN3" is the one I actually want to modify, with a separate credential_id column I just look for the corresponding access_key_id, if it is embedded in the hidden credential_secret then that becomes needlessly awkward.
If we do stick with separate columns, I think it would be better for the "get secret" API to return both parts of the credential (id+secret), so the caller can reliably get a matching set, even if credentials are updated/rotated.
Yes, a good idea, thank you.
"cannot be set to expire further in the future unless credential_secret is being changed at the same time" seems like an awkward rule. This means every process that retrieves credentials from a third party and updates the Arvados credential record will need to
- have some assurance that the third-party credential provider will never extend the expiry time of an existing credential, and always use the third-party expiry time instead of a shorter one; or
- proactively (or in response to a "can't extend expiry time" error) work around this by saving a bogus secret without changing expiry, then saving the real secret with the new expiry time -- and hope no container gets unlucky and accesses the bogus secret
Given that the workaround exists, doesn't this restriction amount to an unnecessary inconvenience? Maybe I'm missing what it's intended to accomplish/prevent.
Well, the idea was something like:
- I shared a credential with you as
can_writeorcan_manage, I set it to expire in 1 week - You decide you'd like to use it for more than 1 week, so you set the expires_at further into the future
- Now you're able to access the resource past the intended expiration date, that's a security weakness
However, on further consideration, I think there's a couple of counter arguments that this isn't a particularly good security feature:
- If you really want the credential and already have access to it, you can just run a custom container that deliberately leaks it
- If I really want to prevent you from using the credential past some expiration date, that should be implemented on the underlying credential (on AWS or whatever) and not make it an Arvados problem.
The logic is messy, too, so I'm happy to yank it.
Updated by Peter Amstutz 11 months ago
22680-credentials-storage @ f428cb819626823086e11c7f8f6d2338460dc4d4
Updated by Tom Clegg 11 months ago
Peter Amstutz wrote in #note-31:
- credential_scopes -- should be just scopes? (it's not a third part of the id/secret key pair, and doesn't collide with anything)
It could also be called "resources" since we use "scopes" to mean something else in another table. WDYT? I don't feel strongly about it.
Docs already use "resources" to mean something else. I think "scopes" is better.
What is the use case for showing the credential_id ("access key ID" in aws-speak) to non-container clients?
The id/secret names align well with AWS keys, but I wonder if the general case would be better served by a single "secret" field whose schema is determined by "credential_class" -- e.g., ifcredential_class = "aws_access_key"thensecret = {"access_key_id":"...","secret_access_key":"..."}...?
- The most obvious (to me) 2nd use case is an OAuth API key, which is just a single opaque string that would go in
credential_secret-- thecredential_idcolumn would be logged, but otherwise unused.This is used to identify the credential if you want to update/delete it later (e.g. in Workbench). Another example that would go into
credential_idwould be the username part of a username/password pair. Arvados API tokens have a similar split of the token UUID (not secret) and a secret portion, although we typically smoosh them togther.Sometimes the
credential_idcould be blank, yes, it depends on the semantics of a particularcredential_class. But this pattern of having a public part and a private part seems common (seeing as how even we do it!) and useful to capture.
Is it really useful to capture, though? I think it would be quite awkward to use these two separate fields to store the UUID and secret parts of an Arvados token. Surely, one would store the entire Arvados token in the "secret" column, whether or not there's a separate "id" column available.
OK, butI think the strongest argument is that sometimes I might to need to update or delete a credential and I don't remember which of the credentials named "AWS_TOKEN1", "AWS_TOKEN2", or "AWS_TOKEN3" is the one I actually want to modify, with a separate
credential_idcolumn I just look for the corresponding access_key_id, if it is embedded in the hiddencredential_secretthen that becomes needlessly awkward.
- why did you name your tokens "AWS_TOKEN1" and "AWS_TOKEN2" instead of something meaningful to you?
- how will you differentiate between credential records that use the same access_key_id?
- If you really want the credential and already have access to it, you can just run a custom container that deliberately leaks it
- If I really want to prevent you from using the credential past some expiration date, that should be implemented on the underlying credential (on AWS or whatever) and not make it an Arvados problem.
The logic is messy, too, so I'm happy to yank it.
Yeah. I think the best approach would be to discourage giving write access to credentials.
I can see the logic of requiring the secret to be presented (not necessarily changed) in order to extend expiry, but even that seems to creep into "protecting secrets from crafty clients" territory, while our stated goal is to make it hard to accidentally leak secrets.
Updated by Peter Amstutz 11 months ago
22680-credentials-storage @ 7b3a6bfd9670c3e3e8005f54ce473c56c3d9ffe2
Updated by Peter Amstutz 11 months ago
22680-credentials-storage @ 1e4342342182de48dc126896a67f6520c10df80e
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Added credential table
- Added controller
- Added tests
- Added Credential struct to Go SDK.
- Fields of the
Credentialobject are now calledcredential_class,scope,external_id,secret, andexpires_at.
- Added a periodic task to clean up expired credentials in
sys/trash_sweep - Anything not implemented (discovered or discussed during work) has a follow-up story.
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- passes tests, and confirmed that the API can be used through controller
- New or changed UX/UX and has gotten feedback from stakeholders.
- was discussed at engineering meeting and then discussed some more on the ticket.
- Documentation has been updated.
- yes, added new page for new API endpoint
- Behaves appropriately at the intended scale (describe intended scale).
- I don't expect there to be massive numbers of credential fields so I did not add any indexes besides the ones that we've written automated tests to check for
- Considered backwards and forwards compatibility issues between client and server.
- this is an entirely new feature so there are no backwards compatability concerns. the API revision has been updated so that clients can check for feature support.
- Follows our coding standards and GUI style guidelines.
- yes
There's also a higher-level Go API but I wasn't quite sure how to add it or even where the code for that lives.
Updated by Tom Clegg 11 months ago
- Field is called
scopehere, it should bescopes - Add
nameanddescriptionto resource attributes table (the "common resource fields" link sends you back here to find out whether they exist) - The
secretmethod doc should mention that the response is a json object like{"external_id":"...","secret":"..."}-- not just the secret string, as one might guess from "get the value ofsecret" - "secret ... cannot be used in queries" is mentioned in the description of fields, but maybe it deserves to be mentioned again in the "list" method?
Maybe we should have an index on (expires_at, secret='') and add ...and secret <> '' the scrub query? If the number of rows is small it doesn't matter, but it could be beneficial if there's an automated process that adds a lot of rows.
Indentation is messed up in CredentialsController#credential_secret at lg.properties = ...
Use Container::Running instead of "Running" when checking state
Credential model needs something like def searchable_columns; super - ['secret']; end. Currently listing with where={any:'my_password'} allows probing, and listing with filters=[["any","like","m%"]] allows probing one character at a time.
Can we test expiration by setting expires_at to now (maybe using update_all() or whatever, so we know we're not relying on activerecord hooks), rather than setting it to now+1s and sleeping 1s?
(Aside: while checking that I found Bug #22844: listing with filters [[any,=,foobar]] fails with ArgumentError: no time information in "foobar")
The "secret" controller method could check container/state and return early, rather than being a pyramid. Same with the "expired" case, we could return early instead of having a pyramid. Is it relevant to check @object and can?(read:...) here? The render_404_if_no_object hook should have already returned 404 if the requested UUID doesn't exist or isn't readable by current user. I think if we were actually relying on the secret method for those cases, it would be returning the wrong status (403).
Tests:
Seems like many of the things checked in the integration tests would be tidier as individual unit/functional tests ('secret cannot appear in queries'...).
I can make a branch to add the controller methods/routing.
Updated by Tom Clegg 11 months ago
22680-credentials-controller @ c12ffc63d43c3181bfbe67566bde879243445678 -- developer-run-tests: #4762
Based on unmerged 22680-credentials-storage at a2f2b2a0d0. Uses "new code path" to handle the new credentials endpoints in controller.
Updated by Peter Amstutz 11 months ago
Tom Clegg wrote in #note-38:
API doc page:
- Field is called
scopehere, it should bescopes- Add
nameanddescriptionto resource attributes table (the "common resource fields" link sends you back here to find out whether they exist)- The
secretmethod doc should mention that the response is a json object like{"external_id":"...","secret":"..."}-- not just the secret string, as one might guess from "get the value ofsecret"- "secret ... cannot be used in queries" is mentioned in the description of fields, but maybe it deserves to be mentioned again in the "list" method?
All fixed.
Maybe we should have an index on
(expires_at, secret='')and add...and secret <> ''the scrub query? If the number of rows is small it doesn't matter, but it could be beneficial if there's an automated process that adds a lot of rows.
I added secret != '' to the update_all query, that is a really good idea and will prevent it from updating already-expired rows unnecessarily.
An index isn't going to add anything until we get to 100s of thousands of rows, at which point the question will be do we need an index or do we need to stop doing whatever it is that adds 100s of thousands of rows? I feel like it is just clutter (the last time I looked at this we have something like 100+ indexes defined, about half of which actually get used, the other half of which are used in an inconsequential number of queries and are strictly dead weight.)
Indentation is messed up in
CredentialsController#credential_secretatlg.properties = ...Use
Container::Runninginstead of"Running"when checking stateCan we test expiration by setting expires_at to now (maybe using update_all() or whatever, so we know we're not relying on activerecord hooks), rather than setting it to now+1s and sleeping 1s?
Fixed.
Credential model needs something like
def searchable_columns; super - ['secret']; end. Currently listing withwhere={any:'my_password'}allows probing, and listing withfilters=[["any","like","m%"]]allows probing one character at a time.(Aside: while checking that I found Bug #22844: listing with filters [[any,=,foobar]] fails with ArgumentError: no time information in "foobar")
So subtracting it from searchable_columns is important, but I couldn't test it because of #22844, so I ended up having to fix that bug at the same time. See fc3124e540b9ca42e47c65ea57efee9e1d1f5d5c
The "secret" controller method could check container/state and return early, rather than being a pyramid. Same with the "expired" case, we could return early instead of having a pyramid. Is it relevant to check
@objectandcan?(read:...)here? Therender_404_if_no_objecthook should have already returned 404 if the requested UUID doesn't exist or isn't readable by current user. I think if we were actually relying on thesecretmethod for those cases, it would be returning the wrong status (403).
I un-pyramided the pyramid.
Yes, the 404 method probably kicks in first, but I'd rather explicitly confirm read access than not?
Tests:
Seems like many of the things checked in the integration tests would be tidier as individual unit/functional tests ('secret cannot appear in queries'...).
Perhaps, but I also like having all the tests related to a component in one file.
I can make a branch to add the controller methods/routing.
I will take a look.
22680-credentials-storage @ 46fa195a774b013c0cfd179a08b2b9de51bcf7c2
Updated by Tom Clegg 11 months ago
Peter Amstutz wrote in #note-40:
Indentation is messed up in
CredentialsController#credential_secretatlg.properties = ...
Not sure what's (still) going on here but it looks bizarre and I guess it doesn't matter that much.
So subtracting it from searchable_columns is important, but I couldn't test it because of #22844, so I ended up having to fix that bug at the same time. See fc3124e540b9ca42e47c65ea57efee9e1d1f5d5c
That case statement looks like an awfully long way to spell col.type == :datetime
Yes, the 404 method probably kicks in first, but I'd rather explicitly confirm read access than not?
The 404 method definitely kicks in first, otherwise this method would return 403 and assert_response 404 would fail.
I don't follow the "confirm read access" reasoning. Our approach everywhere else is to check permission one time, and rely on tests to confirm that it works. Why do we need something different here? As far as I can tell this is a pointless query that also happens to be misleading.
I suppose if permission is removed between permission-check and confirm-permission-check, we would return 403 here (nearly won race) instead of the usual 200 (won race) or 404 (lost race). I think we would be better off returning 200 (won race) like other APIs.
Perhaps, but I also like having all the tests related to a component in one file.
🤷
Rest LGTM
Updated by Tom Clegg 11 months ago
- Related to Bug #22844: listing with filters [[any,=,foobar]] fails with ArgumentError: no time information in "foobar" added
Updated by Peter Amstutz 11 months ago
Tom Clegg wrote in #note-41:
Peter Amstutz wrote in #note-40:
Indentation is messed up in
CredentialsController#credential_secretatlg.properties = ...Not sure what's (still) going on here but it looks bizarre and I guess it doesn't matter that much.
I really thought I fixed that before. for some reason emacs is having trouble setting the indentation correctly. Fixed it again.
So subtracting it from searchable_columns is important, but I couldn't test it because of #22844, so I ended up having to fix that bug at the same time. See fc3124e540b9ca42e47c65ea57efee9e1d1f5d5c
That case statement looks like an awfully long way to spell
col.type == :datetime
Good point. I shrank that down to a one-liner.
Yes, the 404 method probably kicks in first, but I'd rather explicitly confirm read access than not?
The 404 method definitely kicks in first, otherwise this method would return 403 and
assert_response 404would fail.I don't follow the "confirm read access" reasoning. Our approach everywhere else is to check permission one time, and rely on tests to confirm that it works. Why do we need something different here? As far as I can tell this is a pointless query that also happens to be misleading.
I suppose if permission is removed between permission-check and confirm-permission-check, we would return 403 here (nearly won race) instead of the usual 200 (won race) or 404 (lost race). I think we would be better off returning 200 (won race) like other APIs.
You're right, there is already a test checking the 404 behavior, so that permission check isn't doing anything. I took it out.
Rest LGTM
22680-credentials-storage @ c31b90d5ce667daed2ea27aface6bd2be891dc82
I'll merge if this passes.
Updated by Lisa Knox 7 months ago
- Related to Bug #23152: Validate schema of complex fields added