Bug #23004
closedtest-provision jobs failing with error: Not logged in
Description
First occurrence of i.e.: debian12
17:22:35 ----------
17:22:35 ID: arvados-shell-resources-virtual-machines-shell-record-cmd-run
17:22:35 Function: cmd.run
17:22:35 Name: arv --format=uuid \
17:22:35 virtual_machine \
17:22:35 create \
17:22:35 --virtual-machine '{"hostname":"shell.dbn12.local" }'
17:22:35
17:22:35 Result: False
17:22:35 Comment: Command "arv --format=uuid \
17:22:35 virtual_machine \
17:22:35 create \
17:22:35 --virtual-machine '{"hostname":"shell.dbn12.local" }'
17:22:35 " run
17:22:35 Started: 20:22:32.869450
17:22:35 Duration: 1473.0 ms
17:22:35 Changes:
17:22:35 ----------
17:22:35 pid:
17:22:35 54211
17:22:35 retcode:
17:22:35 1
17:22:35 stderr:
17:22:35 Error: Not logged in (req-4hg96qbv0t5cphficr02)
17:22:35 stdout:
17:22:35 Name: nginx - Function: service.mod_watch - Result: Changed - Started: 20:22:34.344563 - Duration: 110.723 ms
17:22:35
17:22:35 Summary for local
17:22:35 --------------
17:22:35 Succeeded: 163 (changed=122)
17:22:35 Failed: 1
17:22:35 --------------
I've run the same job but against production packages and it succeeded: test-provision-debian12: #373
Updated by Lucas Di Pentima 9 months ago
I've also deployed a multi-node test cluster against latest development packages, and got the same error.
RailsAPI and Controller services are working, and manually running arv user current with the superuser token works OK from the shell node.
Updated by Lucas Di Pentima 9 months ago
Manually running the same failing request succeeds:
admin@shell:~$ arv --format=uuid virtual_machine create --virtual-machine '{"hostname":"shell.xarv3.example.com" }'
xarv3-2x53u-b8qe0a7k7o7zug7
Updated by Lucas Di Pentima 9 months ago
I think that I've found the behavior that causes this issue.
At cluster 'just-deployed' state¶
- The 'Not logged in' error happens when using the system root token
- Restarting controler and/or RailsAPI doesn't make the problem go away
- The
userstable has no records:xarv3_arvados=# select id,username,is_active from users; id | username | is_active ----+----------+----------- (0 rows)
After logging in as an admin in workbench¶
- The problem no longer happens
- The
userstable contains both root and the admin user:xarv3_arvados=# select id,username,is_active from users; id | username | is_active ----+----------+----------- 1 | | t 2 | admin | t (2 rows)
I could go ahead an manually do a binary search of which development package version started causing this, but first will confirm with the team if there's any suspicion on which branch could have introduced this new behavior.
Updated by Lucas Di Pentima 9 months ago
Lucas Di Pentima wrote in #note-2:
Manually running the same failing request succeeds:
[...]
This succeeded because my manual test was done after logging in as the admin user in workbench for the first time.
Updated by Brett Smith 9 months ago
- Status changed from In Progress to Resolved
Lucas Di Pentima wrote in #note-3:
I think that I've found the behavior that causes this issue.
This was a regression introduced in #22704 and fixed in 9b00512b586547cc36be1ade99c311cc54a4f197. Thank you for your help tracking it down.