Bug #23304
closedInvestigate test-provision failures since Nov 12/Docker 29 release
Description
There was one successful run of test-provision on November 10, then it started failing on November 12. test-provision: #1357
It gets all the way to running diagnostics, which fail with the output shown below.
Worth noting: there were no code changes in main between November 10 and 12. test-provision ran after a successful build of test packages for #23000: build-packages-multijob: #4936 - But those packages weren't uploaded (note the parameters and console output), so I don't believe they can affect test-provision either. It feels like something in the underlying environment must've changed.
Running arvados-client diagnostics... level=info msg=" 5: running health check (same as `arvados-server check`)" level=info msg=" 10: getting discovery document from https://dbn12.local:8800/discovery/v1/apis/arvados/v1/rest" level=info msg=" 20: getting exported config from https://dbn12.local:8800/arvados/v1/config" level=info msg=" 30: getting current user record" level=info msg=" 40: connecting to Keepproxy endpoint https://dbn12.local:8801/" level=info msg=" 41: connecting to WebDAV endpoint https://dbn12.local:8802/" level=info msg=" 42: connecting to WebDAVDownload endpoint https://dbn12.local:8802/" level=info msg=" 43: connecting to Websocket endpoint wss://dbn12.local:8804/websocket" level=info msg=" 44: connecting to Workbench1 endpoint https://dbn12.local:8805/" level=info msg=" 45: connecting to Workbench2 endpoint https://dbn12.local:443/" level=info msg=" 50: checking CORS headers at https://dbn12.local:8800/" level=info msg=" 51: checking CORS headers at https://dbn12.local:8801/d41d8cd98f00b204e9800998ecf8427e+0" level=info msg=" 52: checking CORS headers at https://dbn12.local:8802/" level=info msg=" 60: checking internal/external client detection" level=info msg=" ... controller returned only non-proxy services, this host is treated as \"internal\"" level=info msg=" 61: reading+writing via keep service at http://127.0.0.1:25107/" level=info msg=" 80: finding/creating \"scratch area for diagnostics\" project" level=info msg=" 90: creating temporary collection" #0 building with "default" instance using docker driver […elide Docker image build output…] #8 DONE 2.5s level=info msg=" ... arvados-client version: /arvados-client 3.3.0~dev20251103181659 (go1.24.1)\n" level=info msg=" ... docker image size is 55567360" level=info msg=" 100: uploading file via webdav" level=info msg=" 110: checking WebDAV ExternalURL wildcard (https://dbn12.local:8802/)" level=warning msg=" ... WebDAV ExternalURL has no leading wildcard and TrustAllContent==false - content previews will not work" level=info msg=" 120: downloading from webdav (https://dbn12.local:8802/foo)" level=info msg=" 121: downloading from webdav (https://dbn12.local:8802/sha256:664244dceaa112af8e3af24d8da0582545fc7299118e93fa7ea9c65d324f45b8.tar)" level=info msg=" 122: downloading from webdav (https://dbn12.local:8802/c=d41d8cd98f00b204e9800998ecf8427e+0/_/foo)" level=info msg=" 123: downloading from webdav (https://dbn12.local:8802/c=d41d8cd98f00b204e9800998ecf8427e+0/_/sha256:664244dceaa112af8e3af24d8da0582545fc7299118e93fa7ea9c65d324f45b8.tar)" level=info msg=" 124: downloading from webdav (https://dbn12.local:8802/sha256:664244dceaa112af8e3af24d8da0582545fc7299118e93fa7ea9c65d324f45b8.tar)" level=warning msg=" ... skipping collection-id-in-vhost test because WebDAV ExternalURL has no leading wildcard" level=info msg=" 125: downloading from webdav (https://dbn12.local:8802/c=dbn12-4zz18-h3prub13wkb4y8n/_/sha256:664244dceaa112af8e3af24d8da0582545fc7299118e93fa7ea9c65d324f45b8.tar)" level=info msg=" 130: getting list of virtual machines" level=info msg=" 150: connecting to webshell service" level=info msg=" 160: running a container" level=info msg=" ... container request uuid = dbn12-xvhdp-yhyzog8t8xn4e0h" level=info msg=" ... container request submitted, waiting up to 10m for container to run" level=error msg=" 160: running a container (18549 ms): container request dbn12-xvhdp-yhyzog8t8xn4e0h is final but container dbn12-dz642-mtak26qxtw7gi3p did not complete: container state = \"Cancelled\"" level=info msg="9990: deleting temporary collection" --- cut here --- error summary --- level=error msg=" 160: running a container (18549 ms): container request dbn12-xvhdp-yhyzog8t8xn4e0h is final but container dbn12-dz642-mtak26qxtw7gi3p did not complete: container state = \"Cancelled\"" Diagnostics run FAILED, exiting
Files
Updated by Lucas Di Pentima 4 months ago
I'm seeing a similar issue while trying to finish the test-ansible Jenkins pipeline:
...with the difference that the container status is still "Running" when the diagnostics tool times out.
... level=info msg=" 150: connecting to webshell service" level=info msg=" ... skipping, webshell not configured" level=info msg=" 160: running a container" level=info msg=" ... container request uuid = xurid-xvhdp-2iz73v8hucayvvv" level=info msg=" ... container request submitted, waiting up to 10m for container to run" level=error msg=" 160: running a container (602178 ms): timed out waiting for container to finish; container request xurid-xvhdp-2iz73v8hucayvvv state was \"Committed\", container xurid-dz642-mrlp3eau3l227rn state was \"Running\"" level=info msg="9990: deleting temporary collection" --- cut here --- error summary --- level=error msg=" 160: running a container (602178 ms): timed out waiting for container to finish; container request xurid-xvhdp-2iz73v8hucayvvv state was \"Committed\", container xurid-dz642-mrlp3eau3l227rn state was \"Running\""
Updated by Brett Smith 4 months ago
Lucas Di Pentima wrote in #note-1:
I'm seeing a similar issue while trying to finish the test-ansible Jenkins pipeline:
...with the difference that the container status is still "Running" when the diagnostics tool times out.
I have reproduced this on an interactive one-off VM, looking through the logs.
Updated by Brett Smith 4 months ago
Lucas Di Pentima wrote in #note-1:
I'm seeing a similar issue while trying to finish the test-ansible Jenkins pipeline:
...with the difference that the container status is still "Running" when the diagnostics tool times out.
We got the job past this point by making sure to use the real hostname consistently. I wonder if something changed in the way GCP resolves DNS that broke our existing setup. ("It's always DNS.")
Updated by Brett Smith 4 months ago
Dispatch is actually failing because:
#<ActiveRecord::RecordInvalid: Validation failed: Gateway address cannot be modified in state 'Locked' (nil, "tunnel http://127.0.0.1:8003/")> ["Gateway address cannot be modified in state 'Locked' (nil, \"tunnel http://127.0.0.1:8003/\")"] Error 1763476022+041822b7: 422
The full API server log, formatted for readability:
{
"method": "PUT",
"path": "/arvados/v1/containers/dbn12-dz642-iu0onkyxf7hqpri",
"format": "html",
"controller": "Arvados::V1::ContainersController",
"action": "update",
"status": 422,
"allocations": 4385,
"duration": 23.05,
"view": 0.55,
"db": 5.74,
"ClusterID": "dbn12",
"request_id": "req-m2c7rlog8r2b18fqtbxx",
"client_ipaddr": "127.0.0.1",
"client_auth": "dbn12-gj3su-000000000000000",
"exception": "#<ActiveRecord::RecordInvalid: Validation failed: Gateway address cannot be modified in state 'Locked' (nil, \"tunnel http://127.0.0.1:8003/\")>",
"exception_backtrace": "/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/validations.rb:84:in `raise_validation_error'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/validations.rb:55:in `save!'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/transactions.rb:313:in `block in save!'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/transactions.rb:365:in `block in with_transaction_returning_status'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract/database_statements.rb:342:in `transaction'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/transactions.rb:361:in `with_transaction_returning_status'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/transactions.rb:313:in `save!'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/suppressor.rb:56:in `save!'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/persistence.rb:906:in `block in update!'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/transactions.rb:365:in `block in with_transaction_returning_status'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract/transaction.rb:535:in `block in within_new_transaction'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/concurrency/null_lock.rb:9:in `synchronize'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract/transaction.rb:532:in `within_new_transaction'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract/database_statements.rb:344:in `transaction'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/transactions.rb:361:in `with_transaction_returning_status'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/persistence.rb:904:in `update!'
/var/www/arvados-api/current/app/controllers/application_controller.rb:125:in `update'
/var/www/arvados-api/current/app/controllers/arvados/v1/containers_controller.rb:42:in `update'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/abstract_controller/base.rb:224:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal/rendering.rb:165:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/abstract_controller/callbacks.rb:259:in `block in process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/callbacks.rb:121:in `block in run_callbacks'
/var/www/arvados-api/current/app/controllers/application_controller.rb:467:in `block in set_current_request_id'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/tagged_logging.rb:139:in `block in tagged'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/tagged_logging.rb:39:in `tagged'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/tagged_logging.rb:139:in `tagged'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/broadcast_logger.rb:241:in `method_missing'
/var/www/arvados-api/current/app/controllers/application_controller.rb:466:in `set_current_request_id'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/callbacks.rb:130:in `block in run_callbacks'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/callbacks.rb:141:in `run_callbacks'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/abstract_controller/callbacks.rb:258:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal/rescue.rb:25:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal/instrumentation.rb:74:in `block in process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/notifications.rb:206:in `block in instrument'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/notifications.rb:206:in `instrument'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal/instrumentation.rb:73:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal/params_wrapper.rb:261:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/railties/controller_runtime.rb:32:in `process_action'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/abstract_controller/base.rb:160:in `process'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionview-7.1.5.2/lib/action_view/rendering.rb:40:in `process'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal.rb:227:in `dispatch'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_controller/metal.rb:309:in `dispatch'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/routing/route_set.rb:49:in `dispatch'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/routing/route_set.rb:32:in `serve'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/journey/router.rb:51:in `block in serve'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/journey/router.rb:131:in `block in find_routes'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/journey/router.rb:124:in `each'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/journey/router.rb:124:in `find_routes'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/journey/router.rb:32:in `serve'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/routing/route_set.rb:882:in `call'
/var/www/arvados-api/current/config/initializers/clear_empty_content_type.rb:20:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/tempfile_reaper.rb:20:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/etag.rb:29:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/conditional_get.rb:44:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/head.rb:15:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/http/permissions_policy.rb:36:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/http/content_security_policy.rb:36:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-session-2.1.1/lib/rack/session/abstract/id.rb:274:in `context'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-session-2.1.1/lib/rack/session/abstract/id.rb:268:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/cookies.rb:689:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/callbacks.rb:29:in `block in call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/callbacks.rb:101:in `run_callbacks'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/callbacks.rb:28:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/debug_exceptions.rb:29:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/show_exceptions.rb:31:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/lograge-0.14.0/lib/lograge/rails_ext/rack/logger.rb:18:in `call_app'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/railties-7.1.5.2/lib/rails/rack/logger.rb:24:in `block in call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/tagged_logging.rb:139:in `block in tagged'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/tagged_logging.rb:39:in `tagged'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/tagged_logging.rb:139:in `tagged'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/broadcast_logger.rb:241:in `method_missing'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/railties-7.1.5.2/lib/rails/rack/logger.rb:24:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/request_store-1.7.0/lib/request_store/middleware.rb:19:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/request_id.rb:28:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/method_override.rb:28:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/runtime.rb:24:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/executor.rb:14:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/rack-3.2.3/lib/rack/sendfile.rb:131:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/actionpack-7.1.5.2/lib/action_dispatch/middleware/ssl.rb:79:in `call'
/var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:99:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/railties-7.1.5.2/lib/rails/engine.rb:536:in `call'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/passenger-6.0.27/src/ruby_supportlib/phusion_passenger/rack/thread_handler_extension.rb:107:in `process_request'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/passenger-6.0.27/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:157:in `accept_and_process_next_request'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/passenger-6.0.27/src/ruby_supportlib/phusion_passenger/request_handler/thread_handler.rb:110:in `main_loop'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/passenger-6.0.27/src/ruby_supportlib/phusion_passenger/request_handler.rb:419:in `block (3 levels) in start_threads'
/var/www/arvados-api/shared/vendor_bundle/ruby/3.1.0/gems/passenger-6.0.27/src/ruby_supportlib/phusion_passenger/utils.rb:111:in `block in create_thread_and_abort_on_exception'
",
"params": {
"container": "{\"gateway_address\":\"tunnel http://127.0.0.1:8003/\"}",
"select": "[\"uuid\"]"
},
"@timestamp": "2025-11-18T14:27:02.361857440Z",
"@version": "1",
"message": "[422] PUT /arvados/v1/containers/dbn12-dz642-iu0onkyxf7hqpri (Arvados::V1::ContainersController#update)"
}
Updated by Brett Smith 4 months ago
- Assigned To set to Brett Smith
- Status changed from New to In Progress
For posterity I am attaching the full logs from the last successful and first failed run. I have confirmed that they install the same versions of arvados-api-server, arvados-controller, crunch-dispatch-local, and crunch-run.
Updated by Brett Smith 4 months ago
Updated by Brett Smith 4 months ago
- Subject changed from Investigate test-provision failures since Nov 12 to Investigate test-provision failures since Nov 12/Docker 29 release
What changed in the meantime is Docker 29 came out. I bet that changed something that's causing unexpected behavior from Arvados. The full list of different package versions:
--- /home/brett/Scratch/512-pkgs.txt 2025-11-18 10:06:15.401698257 -0500
+++ /home/brett/Scratch/513-pkgs.txt 2025-11-18 10:06:21.173928773 -0500
@@ -12,7 +12,7 @@
'bison' changed from 'absent' to '2:3.8.2+dfsg-1+b1'
'build-essential' changed from 'absent' to '12.9'
'bzip2' changed from 'absent' to '1.0.8-5+b1'
-'containerd.io' changed from 'absent' to '1.7.29-1~debian.12~bookworm'
+'containerd.io' changed from 'absent' to '2.1.5-1~debian.12~bookworm'
'cpp' changed from 'absent' to '4:12.2.0-3'
'cpp-12' changed from 'absent' to '12.2.0-14+deb12u1'
'crunch-dispatch-local' changed from 'absent' to '3.3.0~dev20251103181659-1'
@@ -21,9 +21,9 @@
'dnsmasq-base' changed from 'absent' to '2.90-4~deb12u1'
'dns-root-data' changed from 'absent' to '2024071801~deb12u1'
'docker-buildx-plugin' changed from 'absent' to '0.29.1-1~debian.12~bookworm'
-'docker-ce' changed from 'absent' to '5:28.5.2-1~debian.12~bookworm'
-'docker-ce-cli' changed from 'absent' to '5:28.5.2-1~debian.12~bookworm'
-'docker-ce-rootless-extras' changed from 'absent' to '5:28.5.2-1~debian.12~bookworm'
+'docker-ce' changed from 'absent' to '5:29.0.0-1~debian.12~bookworm'
+'docker-ce-cli' changed from 'absent' to '5:29.0.0-1~debian.12~bookworm'
+'docker-ce-rootless-extras' changed from 'absent' to '5:29.0.0-1~debian.12~bookworm'
'docker-compose-plugin' changed from 'absent' to '2.40.3-1~debian.12~bookworm'
'dpkg-dev' changed from 'absent' to '1.21.22'
'fakeroot' changed from 'absent' to '1.31-1.2'
@@ -128,7 +128,7 @@
'libxslt1.1' changed from 'absent' to '1.1.35-1+deb12u2'
'libyaml-dev' changed from 'absent' to '0.2.5-1'
'libyuv0' changed from 'absent' to '0.0~git20230123.b2528b0-1'
-'linux-libc-dev' changed from 'absent' to '6.1.153-1'
+'linux-libc-dev' changed from 'absent' to '6.1.158-1'
'locales' changed from 'absent' to '2.36-9+deb12u13'
'm4' changed from 'absent' to '1.4.19-3'
'make' changed from 'absent' to '4.3-4.1'
Updated by Tom Clegg 4 months ago
This error is from an API call made by crunch-run, not crunch-dispatch-local:
Gateway address cannot be modified in state 'Locked'
...and crunch-run doesn't even check the error response from that call. This could be improved (it should not bother doing the API call if state is still Locked, and it should log something if it the API call fails) but I don't think it's causing the container to fail here.
Systemd journal should have crunch-run logs with better clues.
Updated by Tom Clegg 4 months ago
https://docs.docker.com/engine/release-notes/29/#breaking-changes says "The daemon now requires API version v1.44 or later (Docker v25.0+)."
source:lib/crunchrun/docker.go says
// DockerAPIVersion is the API version we use to communicate with the
// docker service. The oldest OS we support is Ubuntu 18.04 (bionic)
// which originally shipped docker 1.17.12 / API 1.35 so there is no
// reason to use an older API version. See
// https://dev.arvados.org/issues/15370#note-38 and
// https://docs.docker.com/engine/api/.
const DockerAPIVersion = "1.35"
But what does Docker mean by "daemon now requires API version"?
https://docs.docker.com/reference/api/engine/ says "A given version of the Docker Engine SDK supports a specific version of the Docker Engine API, as well as all earlier versions. If breaking changes occur, they are documented prominently."
Did Docker 29 abandon the "all earlier versions" rule? Or maybe it just changed the behavior of a 1.35 API in a way that breaks crunch-run?
Oh, yes, it did. Sort of. Now we have to explicitly enable support for older API versions.
Updated by Brett Smith 4 months ago
I propose we just bump the Docker API version we use. Our installers have all standardized on installing Docker from Docker's repository, so we can be confident we have a recent version.
23304-docker-api-version @ 4691890b1c3188b76a6e4b554da162c6fd42f93d - developer-run-tests: #4946
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- The real test will be whether this fixes test-provision. But it's a good change to do regardless so I think we should just go ahead and do it anyway and see where it gets us.
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- In addition to automated tests, manually ran docker-cleaner in my development environment, and it did delete images as expected.
- Tested code incorporates recent main branch changes.
- Yes
- New or changed UI/UX and has gotten feedback from stakeholders.
- N/A
- Documentation has been updated.
- N/A
- Behaves appropriately at the intended scale (describe intended scale).
- Any change is basically forced on us by upstream, so,
- Considered backwards and forwards compatibility issues between client and server.
- Ditto
- Follows our coding standards and GUI style guidelines.
- N/A, no change is significant enough to affect style
Updated by Brett Smith 4 months ago
Zoë Ma wrote in #note-12:
I ran the tests locally, and the test for
lib/crunchrun(specificallyTestCrunchExec) failed. Yet, the developer-run-tests build was all good. I'm doing a bit of investigation here.
This seems most likely to be either a difference in test environment or maybe a racey test where you got unlucky. If you can please share the full failed test output I could probably point you in the right direction.
Updated by Zoë Ma 4 months ago
I was running the test that produced the attached log with the following command (from the Arvados source repo directory)
WORKSPACE="${PWD}" build/run-tests.sh --temp ~/arvados-test --only lib/crunchrun > log.txt
Before I ran the command I deleted all the files in the temp directory.
Please let me know if you need any further information.
Updated by Brett Smith 4 months ago
Zoë Ma wrote in #note-14:
I was running the test that produced the attached log with the following command (from the Arvados source repo directory)
WORKSPACE="${PWD}" build/run-tests.sh --temp ~/arvados-test --only lib/crunchrun > log.txt
This is an unrelated bug that happens when you run the lib/crunchrun tests by themselves. You can tell it's unrelated because it also happens in main. I have filed this as #23318. It should be handled separately and not block review or merge.
Updated by Brett Smith 4 months ago
test-provision is passing again. test-provision: #1366
Updated by Brett Smith 4 months ago
- Status changed from In Progress to Resolved