Bug #16683
closedTrouble sharing with federated users
Added by Peter Amstutz over 4 years ago. Updated over 4 years ago.
100%
Description
Workbench 1¶
Share with group¶
Steps to reproduce:
- Create a LoginCluster federation with 2 instances
- Go to the 2nd instance and log in with a non-admin federated user
- Create a project
- Go to the sharing tab
- Share with "all users"
On reloading the sharing tab, it'll get fiddlesticks:
request failed: x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&limit=9223372036854775807&offset=0">https://172.17.0.3:8000/arvados/v1/users?cluster_id=&count=&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&limit=9223372036854775807&offset=0: 400 Bad Request: cannot execute federated list query unless count=="none" [API: 400]
Workbench 2¶
Sharing dialog¶
Fails to open with the similar error:
request failed: x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&offset=0">https://172.17.0.3:8000/arvados/v1/users?cluster_id=&count=&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&offset=0: 400 Bad Request: cannot execute federated list query unless count=="none"
Sharing with user¶
Also observed: attempting to share with a user gets a "unknown uuid" error. This is probably due to the fact that although the user can see other users from the login cluster, the permissions are checked on the federated cluster, which doesn't know that. The validity check on the link tail for permission links should probably just suppress raising an error when the cluster id isn't the local one.
(alternately, we could skip the check entirely, this would partially address the previously reported problem of curators being unable to share with groups that they don't have access to, although they would need to add the permission link by uuid, possibly via the command line, which might not be acceptable.)
Updated by Peter Amstutz over 4 years ago
- Is duplicate of Bug #16681: Sharing dialog fails with "count must be none" when using LoginCluster feature added
Updated by Peter Amstutz over 4 years ago
- Status changed from Duplicate to New
- Description updated (diff)
Updated by Peter Amstutz over 4 years ago
Adding
.with_count("none")
.fetch_multiple_pages(false)
seems like it should produce a working query but now it fails this way:
2020-08-11_19:41:34.01115 {"PID":1419,"RequestID":"req-29gqcx0h0hqobxo9efl6","level":"info","msg":"response","remoteAddr":"127.
0.0.1:34258","reqBytes":0,"reqForwardedFor":"172.17.0.3","reqHost":"172.17.0.4:8000","reqMethod":"GET","reqPath":"arvados/v1/us
ers","reqQuery":"cluster_id=\u0026count=none\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D
%5D%5D\u0026offset=0","respBody":"{\"errors\":[\"Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u00
26filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\\u0026offset=0: EOF\"]}\n","respBytes":196,"respStatus":"Internal Server Error","respStatusCode":500,"time":"2020-08-11T19:41:34.011033813Z","timeToStatus":0.003099,"timeTotal":0.003152,"timeWriteBody":0.000053}
Retries also seem to be doing something weird, is it interpreting the '%' as a format character?
https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026cou
nt=none\\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\\u0026offset=0: 502 Bad Gatew
ay: request failed: https://172.17.0.4:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSING)%!B(MISSING
)%!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\u0026offset=0
: 500 Internal Server Error: Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSIN
G)%!B(MISSING)%!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\
u0026offset=0: EOF\
Updated by Peter Amstutz over 4 years ago
Response on LoginCluster
2020-08-11_20:13:41.04890 {"PID":1419,"RequestID":"req-79u0kwp3hsvwgb19cdci","level":"info","msg":"response","remoteAddr":"127.0.0.1:59
428","reqBytes":0,"reqForwardedFor":"172.17.0.3","reqHost":"172.17.0.4:8000","reqMethod":"GET","reqPath":"arvados/v1/users","reqQuery":
"cluster_id=\u0026count=none\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\u0026offset=0","r
espBody":"{\"errors\":[\"Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%5B%5B%22uuid%22%2C%22
in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\\u0026offset=0: EOF\"]}\n","respBytes":196,"respStatus":"Internal Server Error","
respStatusCode":500,"time":"2020-08-11T20:13:41.048774508Z","timeToStatus":0.005508,"timeTotal":0.005535,"timeWriteBody":0.000027}
Response on federate:
2020-08-11_20:13:41.04952 {"PID":14215,"RequestID":"req-79u0kwp3hsvwgb19cdci","level":"info","msg":"response","remoteAddr":"127.0.0.1:5
9424","reqBytes":0,"reqForwardedFor":"172.17.0.4","reqHost":"172.17.0.3:8000","reqMethod":"GET","reqPath":"arvados/v1/users","reqQuery"
:"cluster_id=\u0026count=none\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\u0026offset=0","
respBody":"{\"errors\":[\"request failed: https://172.17.0.4:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSI
NG)%!B(MISSING)%!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\u0026of
fset=0: 500 Internal Server Error: Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSING)
!B(MISSING)!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\u0026offse
t=0: EOF\"]}\n","respBytes":532,"respStatus":"Bad Gateway","respStatusCode":502,"time":"2020-08-11T20:13:41.049414288Z","timeToStatus":0.015050,"timeTotal":0.015071,"timeWriteBody":0.000021}
Updated by Peter Amstutz over 4 years ago
The %(MISSING) part is just a silly logging bug. Fix pending.
What's really going on:
It's getting stuck in a request loop. The federate ("x1ly6") forwards the user list request to the login cluster ("xrmjz"). The login cluster sees this filter:
[["uuid", "in", "x1ly6-j7d0g-fffffffffffffff"]]
It recognizes it as a uuid and so it forwards the request to the federate. Which forwards the request back to login cluster, which forwards it back to the federate, and so forth until it gives up. (I don't know how it decides to break out of the loop, but at least it doesn't get stuck infinitely).
Issues:
"x1ly6-j7d0g-fffffffffffffff" isn't a user. I think Workbench just simplistically sends the same "uuid in" list for both users and groups (should be harmless, right?). The intended behavior is to share with the "all users" group on the x1ly6 cluster.
Possible fixes:
- Check the type portion of the uuid
- Filter "uuid in" before sending the list request to LoginCluster
- The call chain sets "bypass federation" flag to suppress request propagation.
Updated by Peter Amstutz over 4 years ago
- Blocks Story #16688: Release Arvados 2.0.4 added
Updated by Peter Amstutz over 4 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 4 years ago
16683-fed-sharing @ 70838209c214bfa57ef4bce289e1530a1cc2b081
16683-fed-sharing @ arvados-workbench2|63e2f0e072d4e03d0e01643df94c7dca051a1ce4
Updated by Peter Amstutz over 4 years ago
16683-fed-sharing @ 877689fc26b0b69a94ca525a3bca1ed2236fb4b2
Updated by Lucas Di Pentima over 4 years ago
Reviewing 877689f
- File
services/api/app/models/link.rb
- Updates don’t check for cluster federated configuration, do you think it’s convenient to add it to at least avoid some of the copy&paste errors?
- File
services/api/app/models/arvados_model.rb
- Line 761 & 762: shouldn’t be added only to the Link model?
- Are we already using links to/from PDHs? I haven’t found documentation about it, maybe it’ll need an update? For example, https://doc.arvados.org/v2.0/api/permission-model.html#links talks about tail_uuid being only from a User or Group.
- I think having tests for the above changes would be convenient.
- Ran the fed migration tests on Jenkins and had a failure: arv-federation-migrate-test: #104
Updated by Lucas Di Pentima over 4 years ago
On a second thought: Maybe the fed-migrate tests on Jenkins don't start a federation from the provided commit hash. I was looking to avoid running them locally that usually is slower.
Updated by Peter Amstutz over 4 years ago
Lucas Di Pentima wrote:
Reviewing 877689f
- File
services/api/app/models/link.rb
- Updates don’t check for cluster federated configuration, do you think it’s convenient to add it to at least avoid some of the copy&paste errors?
Added a call to ApiClientAuthorization.remote_host to check for a valid cluster id.
- File
services/api/app/models/arvados_model.rb
- Line 761 & 762: shouldn’t be added only to the Link model?
- Are we already using links to/from PDHs? I haven’t found documentation about it, maybe it’ll need an update? For example, https://doc.arvados.org/v2.0/api/permission-model.html#links talks about tail_uuid being only from a User or Group.
I removed the check since it was functionally useless and didn't exist in the old code.
As it stands, if something doesn't fit the uuid format, it just won't be checked, so it doesn't matter if it is a PDH. That said there are some legacy links classes that use PDH, like 'docker_image_migration'.
- I think having tests for the above changes would be convenient.
Added a test.
- Ran the fed migration tests on Jenkins and had a failure: arv-federation-migrate-test: #104
We discussed this at standup, the jenkins test could (and probably should) do this, but it requires some fiddling, and may come at the expense of the job taking longer. I don't know if we want to block on it.
16683-fed-sharing @ 78444f2fb480801787e486d4b65198d72ab4fe15
Updated by Lucas Di Pentima over 4 years ago
This (78444f2fb480801787e486d4b65198d72ab4fe15) LGTM, thanks!!
Will take a look at wb2.
Updated by Lucas Di Pentima over 4 years ago
Reviewing arvados-workbench2|63e2f0e
WB2 still fails when re-opening the share dialog after being shared with all. The following fixes the problem:
diff --git a/src/views-components/sharing-dialog/participant-select.tsx b/src/views-components/sharing-dialog/participant-select.tsx
index 5d062da0..5260743f 100644
--- a/src/views-components/sharing-dialog/participant-select.tsx
+++ b/src/views-components/sharing-dialog/participant-select.tsx
@@ -131,14 +131,14 @@ export const ParticipantSelect = connect()(
const filterUsers = new FilterBuilder()
.addILike('any', value)
.getFilters();
- const userItems: ListResults<any> = await userService.list({ filters: filterUsers, limit });
+ const userItems: ListResults<any> = await userService.list({ filters: filterUsers, limit, count: "none" });
const filterGroups = new FilterBuilder()
.addNotIn('group_class', [GroupClass.PROJECT])
.addILike('name', value)
.getFilters();
- const groupItems: ListResults<any> = await groupsService.list({ filters: filterGroups, limit });
+ const groupItems: ListResults<any> = await groupsService.list({ filters: filterGroups, limit, count: "none" });
this.setState({
suggestions: this.props.onlyPeople
? userItems.items
With that, it LGTM.
Updated by Peter Amstutz over 4 years ago
Noticed during testing that "share public" doesn't work -- will fix
Updated by Peter Amstutz over 4 years ago
I incorporated your fix and did a little more manual testing. I noticed the "share public" option failed, so I fixed that. I added {count: "none"} in a few more places.
16683-fed-sharing arvados-workbench2|b1250d13a43d4351c6cbca7990c996b3219693d0
Updated by Lucas Di Pentima over 4 years ago
Did some manual testing against ce8i5 and all seems to work correctly, looks good to merge, thanks!
Updated by Peter Amstutz over 4 years ago
- Status changed from In Progress to Resolved
Updated by Peter Amstutz over 4 years ago
We set up a configuration with jutro (LoginCluster) and pirca (federate).
On pirca, the sharing dialog works. This is the case we tested for.
On jutro, the sharing dialog fails:
request failed: pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D">https://pirca.arvadosapi.com/arvados/v1/users?cluster_id=&count=none&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D: 401 Unauthorized: request failed: pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D">https://jutro.arvadosapi.com/arvados/v1/users?cluster_id=&count=none&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D: 401 Unauthorized: request failed: pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D">http://localhost:8004/arvados/v1/users?cluster_id=&count=none&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D: 401 Unauthorized: Not logged in (req-datocp8oyu00b24wspew) [API: 502]
On further research:
It is failing because it was mistakenly shared with the pirca anonymous user (pirca-tpzed-anonymouspublic). This causes some weird federation queries that don't work.
This happened because on jutro, there are two anonymous users to choose from, the jutro one and the pirca one. That shouldn't happen, and it is very confusing.
a) I'm not sure how the pirca one ended up on jutro. Maybe accessing a federated collection as the anonymous user, starting from pirca it used pirca's anonymous user credentials to access the collection on jutro?
b) I can't delete the user from jutro using the API, because controller redirects "user delete" API calls to the cluster that owns the user (pirca, of course). I tried to delete the user on jutro and I'm pretty sure I deleted pirca's anonymous user instead. So now the anonymous user on pirca is gone.
c) When LoginCluster is set, every cluster should use the LoginCluster's anonymous user and token.
Updated by Peter Amstutz over 4 years ago
- Status changed from Resolved to Feedback
Updated by Javier Bértoli over 4 years ago
Applied the changes for the anonymous user in {pirca,jutro} through salt (commit b7de27c@saltstack). We have the automated runs disabled atm, as we are doing some manual changes in the hosts, so deployed with these steps in the salt master:
cd /data/saltstack/pillarstack && gut pull salt '*jutro*' saltutil.refresh_pillar salt '*jutro*' state.apply custom_files,packages,services,exim 2>&1 |tee /tmp/salida_jutro salt '*pirca*' saltutil.refresh_pillar salt '*pirca*' state.apply custom_files,packages,services,exim 2>&1 |tee /tmp/salida_pirca
Updated by Peter Amstutz over 4 years ago
- Related to Bug #16726: other cluster's special users (root and anonymous) can appear in user list added
Updated by Peter Amstutz over 4 years ago
- Status changed from Feedback to Resolved