Project

General

Profile

Actions

Bug #23004

closed

test-provision jobs failing with error: Not logged in

Added by Lucas Di Pentima 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Story points:
-
Release relationship:
Auto

Description

First occurrence of i.e.: debian12

test-provision-debian12: #369

17:22:35 ----------
17:22:35           ID: arvados-shell-resources-virtual-machines-shell-record-cmd-run
17:22:35     Function: cmd.run
17:22:35         Name: arv --format=uuid \
17:22:35   virtual_machine \
17:22:35   create \
17:22:35   --virtual-machine '{"hostname":"shell.dbn12.local" }'
17:22:35 
17:22:35       Result: False
17:22:35      Comment: Command "arv --format=uuid \
17:22:35                 virtual_machine \
17:22:35                 create \
17:22:35                 --virtual-machine '{"hostname":"shell.dbn12.local" }'
17:22:35               " run
17:22:35      Started: 20:22:32.869450
17:22:35     Duration: 1473.0 ms
17:22:35      Changes:   
17:22:35               ----------
17:22:35               pid:
17:22:35                   54211
17:22:35               retcode:
17:22:35                   1
17:22:35               stderr:
17:22:35                   Error: Not logged in (req-4hg96qbv0t5cphficr02)
17:22:35               stdout:
17:22:35   Name: nginx - Function: service.mod_watch - Result: Changed - Started: 20:22:34.344563 - Duration: 110.723 ms
17:22:35 
17:22:35 Summary for local
17:22:35 --------------
17:22:35 Succeeded: 163 (changed=122)
17:22:35 Failed:      1
17:22:35 --------------

I've run the same job but against production packages and it succeeded: test-provision-debian12: #373

Actions #1

Updated by Lucas Di Pentima 9 months ago

I've also deployed a multi-node test cluster against latest development packages, and got the same error.
RailsAPI and Controller services are working, and manually running arv user current with the superuser token works OK from the shell node.

Actions #2

Updated by Lucas Di Pentima 9 months ago

Manually running the same failing request succeeds:

admin@shell:~$ arv --format=uuid virtual_machine create --virtual-machine '{"hostname":"shell.xarv3.example.com" }'
xarv3-2x53u-b8qe0a7k7o7zug7
Actions #3

Updated by Lucas Di Pentima 9 months ago

I think that I've found the behavior that causes this issue.

At cluster 'just-deployed' state

  • The 'Not logged in' error happens when using the system root token
  • Restarting controler and/or RailsAPI doesn't make the problem go away
  • The users table has no records:
    xarv3_arvados=# select id,username,is_active from users;
     id | username | is_active 
    ----+----------+-----------
    (0 rows)
    

After logging in as an admin in workbench

  • The problem no longer happens
  • The users table contains both root and the admin user:
    xarv3_arvados=# select id,username,is_active from users;
     id | username | is_active 
    ----+----------+-----------
      1 |          | t
      2 | admin    | t
    (2 rows)
    

I could go ahead an manually do a binary search of which development package version started causing this, but first will confirm with the team if there's any suspicion on which branch could have introduced this new behavior.

Actions #4

Updated by Lucas Di Pentima 9 months ago

Lucas Di Pentima wrote in #note-2:

Manually running the same failing request succeeds:

[...]

This succeeded because my manual test was done after logging in as the admin user in workbench for the first time.

Actions #5

Updated by Brett Smith 9 months ago

  • Status changed from In Progress to Resolved

Lucas Di Pentima wrote in #note-3:

I think that I've found the behavior that causes this issue.

This was a regression introduced in #22704 and fixed in 9b00512b586547cc36be1ade99c311cc54a4f197. Thank you for your help tracking it down.

Actions

Also available in: Atom PDF