Project

General

Profile

Actions

Bug #18679

closed

arvados-api fails to start if API.DisabledAPIs is not explicitly set in config.yml

Added by Javier Bértoli almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
02/02/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

The arvados API code fails to start, with this error:

==> error.log <==
App 38957 output: Error: The application encountered the following error: API.DisabledAPIs expected Hash but was NilClass (RuntimeError)
App 38957 output:     /var/www/arvados-api/current/lib/config_loader.rb:163:in `block in coercion_and_check'
App 38957 output:     /var/www/arvados-api/current/lib/config_loader.rb:80:in `each'
App 38957 output:     /var/www/arvados-api/current/lib/config_loader.rb:80:in `coercion_and_check'
App 38957 output:     /var/www/arvados-api/current/config/arvados_config.rb:229:in `<top (required)>'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:53:in `require_relative'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:53:in `<class:Application>'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:50:in `<module:Server>'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:48:in `<top (required)>'
App 38957 output:     /var/www/arvados-api/current/config/environment.rb:6:in `require_relative'
App 38957 output:     /var/www/arvados-api/current/config/environment.rb:6:in `<top (required)>'
App 38957 output:     config.ru:7:in `require'
App 38957 output:     config.ru:7:in `block in <main>'
App 38957 output:     /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/rack-2.2.3/lib/rack/builder.rb:125:in `instance_eval'
App 38957 output:     /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/rack-2.2.3/lib/rack/builder.rb:125:in `initialize'
App 38957 output:     config.ru:1:in `new'
App 38957 output:     config.ru:1:in `<main>'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:101:in `eval'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:101:in `preload_app'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:189:in `block in <module:App>'
App 38957 output:     /usr/lib/ruby/vendor_ruby/phusion_passenger/loader_shared_helpers.rb:397:in `run_block_and_record_step_progress'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:188:in `<module:App>'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:30:in `<module:PhusionPassenger>'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:29:in `<main>'
[ E 2022-01-25 16:45:04.9413 12971/T183 age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /var/www/arvados-api/current: The application encountered the following error: API.DisabledAPIs expected Hash but was NilClass (RuntimeError)
  Error ID: 63e03141
  Error details saved to: /tmp/passenger-error-oxPEB0.html

[ E 2022-01-25 16:45:04.9460 12971/Tc age/Cor/Con/CheckoutSession.cpp:281 ]: [Client 3-40] Cannot checkout session because a spawning error occurred. The identifier of the error is 63e03141. Please see earlier logs for details about the error.

Seems like it's only the api server that has trouble with this. Perhaps the bug is in the rails code that ingests the config via `arvados-server config-dump`.


Files

config.yml (3.24 KB) config.yml Javier Bértoli, 02/01/2022 10:34 PM

Subtasks 1 (0 open1 closed)

Task #18711: review 18679-handle-keys-with-null-values-in-configResolvedWard Vandewege02/02/2022

Actions
Actions #1

Updated by Ward Vandewege almost 3 years ago

  • Priority changed from High to Normal
  • Target version set to 2022-02-16 sprint
Actions #2

Updated by Ward Vandewege almost 3 years ago

  • Target version changed from 2022-02-16 sprint to 2022-02-02 sprint
Actions #3

Updated by Ward Vandewege almost 3 years ago

  • Assigned To deleted (Javier Bértoli)
Actions #4

Updated by Ward Vandewege almost 3 years ago

  • Description updated (diff)

The

Actions #5

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2022-02-02 sprint to 2022-02-16 sprint
Actions #6

Updated by Javier Bértoli almost 3 years ago

Added a config.yml file to reproduce the issue (same config file was working correctly in previous versions).

Actions #7

Updated by Peter Amstutz almost 3 years ago

  • Assigned To set to Ward Vandewege
Actions #8

Updated by Peter Amstutz almost 3 years ago

  • Release set to 46
Actions #9

Updated by Ward Vandewege almost 3 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Ward Vandewege almost 3 years ago

Running that config file through arvados-server config-dump gives:

AutoReloadConfig: false
Clusters:
  harpo:
    API:
      AsyncPermissionsUpdateInterval: 0s
      DisabledAPIs: null
      KeepServiceRequestTimeout: 0s
      MaxConcurrentRequests: 0
      MaxIndexDatabaseRead: 0
      MaxItemsPerResponse: 0
      MaxKeepBlobBuffers: 0
      MaxRequestAmplification: 0
      MaxRequestSize: 0
      MaxTokenLifetime: 0s
      RequestTimeout: 0s
      SendTimeout: 0s
      VocabularyPath: "" 
      WebsocketClientEventQueue: 0
      WebsocketServerEventQueue: 0
...

This is very wrong, none of the defaults are there.

Indeed, after updating config.yml like this:

Clusters:
  harpo:

    SystemRootToken: "systemroottokenmushaveatleast32characters" 
    ManagementToken: "managementtokenmushaveatleast32characters" 

    API:
      DisabledAPIs: {}
...

and re-running arvados-server config-dump:

AutoReloadConfig: false
Clusters:
  harpo:
    API:
      AsyncPermissionsUpdateInterval: 20s
      DisabledAPIs: {}
      KeepServiceRequestTimeout: 15s
      MaxConcurrentRequests: 0
      MaxIndexDatabaseRead: 134217728
      MaxItemsPerResponse: 1000
      MaxKeepBlobBuffers: 128
      MaxRequestAmplification: 4
      MaxRequestSize: 134217728
      MaxTokenLifetime: 0s
      RequestTimeout: 5m
      SendTimeout: 1m
      VocabularyPath: "" 
      WebsocketClientEventQueue: 64
      WebsocketServerEventQueue: 4
...

which is as expected. The problem appears to be that the API: section has no values at all in config.yml. Setting any one value makes arvados-server config-dump do the right thing. For example, setting MaxItemsPerResponse:

AutoReloadConfig: False

Clusters:
  harpo:

    SystemRootToken: "systemroottokenmushaveatleast32characters" 
    ManagementToken: "managementtokenmushaveatleast32characters" 

    API:
      MaxItemsPerResponse: 50
...

results in

AutoReloadConfig: false
Clusters:
  harpo:
    API:
      AsyncPermissionsUpdateInterval: 20s
      DisabledAPIs: {}
      KeepServiceRequestTimeout: 15s
      MaxConcurrentRequests: 0
      MaxIndexDatabaseRead: 134217728
      MaxItemsPerResponse: 50
      MaxKeepBlobBuffers: 128
      MaxRequestAmplification: 4
      MaxRequestSize: 134217728
      MaxTokenLifetime: 0s
      RequestTimeout: 5m
      SendTimeout: 1m
      VocabularyPath: "" 
      WebsocketClientEventQueue: 64
      WebsocketServerEventQueue: 4

Okay, the problem is that the config:

AutoReloadConfig: False

Clusters:
  harpo:

    SystemRootToken: "systemroottokenmushaveatleast32characters" 
    ManagementToken: "managementtokenmushaveatleast32characters" 

    API:

    Collections:
      BlobSigningKey: "blobsigningkeymushaveatleast32characters" 
...

Is turned into

...
{
  "AutoReloadConfig": false,
  "Clusters": {
    "harpo": {
      "API": null,
      "Collections": {
        "BlobSigningKey": "blobsigningkeymushaveatleast32characters" 
      },
...

and then we call mergo.Merge(&merged, src, mergo.WithOverride) in lib/config/load.go which overwrites the default API structure with null. Doh.

Actions #11

Updated by Ward Vandewege almost 3 years ago

Ready for review at e414e9d90df8756ab4fc9dab4d2a5729a51a525c on branch 18679-handle-keys-with-null-values-in-config

Tests passed at developer-run-tests: #2894 , minus wb integration tests, which passed on a re-run here: developer-run-tests-apps-workbench-integration: #3079

Actions #12

Updated by Tom Clegg almost 3 years ago

This looks like a templating problem to me. The template/operator didn't intend to put a null value in their config.yml, but they did.

FBOW these are documented ways to spell null in yaml:

key1:
key2: ~
key3: null

That said, we don't currently need null to mean anything other than "use the default" and we generally prefer to have a real value like "" or {} if the intent is to say "use the zero value instead of the default", so this seems like a way to accommodate templating artifacts that doesn't cost us anything.

LGTM, thanks

Actions #13

Updated by Ward Vandewege almost 3 years ago

Tom Clegg wrote:

LGTM, thanks

Thanks, merged.

Actions #14

Updated by Ward Vandewege almost 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF