Bug #17398
closed[crunch-dispatch-local] [crunch-run] error starting gateway server: missing port in address
100%
Description
Trying to update and test the new configuration changes for Arvados with the salt-installer, I run the deploy using the latest dev
packages (2.2.0~dev20210215190825-1).
With them, I cannot successfully run a single-node deploy. The "tools/salt-install/tests/run-test.sh" script fails to run the workflow with the following error:
cwl-runner --debug hasher-workflow.cwl hasher-workflow-job.yml INFO /usr/bin/cwl-runner 2.2.0.dev20210205202546, arvados-python-client 2.2.0.dev20210205202546, cwltool 3.0.20201121085451 INFO Resolved 'hasher-workflow.cwl' to 'file:///root/cluster_tests/hasher-workflow.cwl' INFO hasher-workflow.cwl:36:7: Unknown hint WorkReuse INFO hasher-workflow.cwl:50:7: Unknown hint WorkReuse INFO hasher-workflow.cwl:64:7: Unknown hint WorkReuse INFO Using cluster harpo (https://workbench2.harpo.local:8443/) INFO Upload local files: "test.txt" DEBUG {'harpo-bi6l4-a31be630d4e27ba0': OrderedDict([('href', '/keep_services/harpo-bi6l4-a31be630d4e27ba0'), ('kind', 'arvados#keepService'), ('etag', '5hqqxk0vrj54wia5k0ucch6o4'), ('uuid', 'harpo-bi6l4-a31be630d4e27ba0'), ('owner_uuid', 'harpo-tpzed-000000000000000'), ('created_at', '2021-02-17T13:27:59.426575000Z'), ('modified_by_client_uuid', None), ('modified_by_user_uuid', 'harpo-tpzed-000000000000000'), ('modified_at', '2021-02-17T13:27:59.426575000Z'), ('service_host', 'keep.harpo.local'), ('service_port', 8443), ('service_ssl_flag', True), ('service_type', 'proxy'), ('read_only', False), ('_service_root', 'https://keep.harpo.local:8443/')])} DEBUG 7f2cee57647f15dd443e35537b202981+104: ['https://keep.harpo.local:8443/'] DEBUG Pool max threads is 1 DEBUG Request: PUT https://keep.harpo.local:8443/7f2cee57647f15dd443e35537b202981 INFO PUT 200: 104 bytes in 53.35521697998047 msec (0.002 MiB/sec) DEBUG KeepWriterThread <KeepWriterThread(Thread-1, started daemon 140587978630912)> succeeded 7f2cee57647f15dd443e35537b202981+104 https://keep.harpo.local:8443/ INFO Using collection f55e750025853f5b8ccae3ca79240f65+54 (harpo-4zz18-l6kmq8rt8ccu8um) INFO Using collection cache size 256 MiB DEBUG ENTER jobiter 1613568972.0955696 DEBUG EXIT jobiter 1613568972.096748 0.0011785030364990234 DEBUG ENTER run 1613568972.0968366 DEBUG EXIT run 1613568972.096923 8.654594421386719e-05 DEBUG ENTER jobiter 1613568972.0969877 DEBUG EXIT jobiter 1613568972.0970328 4.506111145019531e-05 INFO [container hasher-workflow.cwl] submitted container_request harpo-xvhdp-39twskql6ok3kw3 INFO Monitor workflow progress at https://workbench2.harpo.local:8443/processes/harpo-xvhdp-39twskql6ok3kw3 INFO [container hasher-workflow.cwl] harpo-xvhdp-39twskql6ok3kw3 is Final ERROR [container hasher-workflow.cwl] (harpo-dz642-bb0isvfn2x3h6an) error log: ** log is empty ** ERROR Overall process status is permanentFail INFO Final output collection None INFO Output at https://workbench2.harpo.local:8443/collections/None {} WARNING Final process status is permanentFail
Checking Arvados' component logs, I find this error in crunch-dispatch-local
:
Feb 17 13:36:14 harpo crunch-dispatch-local[1812]: {"level":"info","msg":"finalized container harpo-dz642-9zdr6wxubfj2k56","time":"2021-02-17T13:36:14.446343132Z"} Feb 17 13:36:14 harpo crunch-dispatch-local[1812]: 2021/02/17 13:36:14 error starting gateway server: missing port in address
Same setup works OK with the stable version (2.1.1)
Extra information that might help:
If I understand correctly, it run OK when trying a multi-host deploy in the cloud (using arvados-dispatch-cloud
and the compiled crunch-run
binary).
Seems to be related to https://dev.arvados.org/issues/17170
Updated by Nico César almost 4 years ago
- Related to Feature #17170: Shell into container proof of concept added
Updated by Javier Bértoli almost 4 years ago
- Blocks Story #17246: Single node salt install improvements added
Updated by Javier Bértoli almost 4 years ago
- Blocks Support #17320: Explain what additonal configuration is needed for provision.sh to go to production added
Updated by Javier Bértoli almost 4 years ago
- Blocks deleted (Support #17320: Explain what additonal configuration is needed for provision.sh to go to production)
Updated by Tom Clegg almost 4 years ago
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
- Target version changed from To Be Groomed to 2021-03-03 sprint
- Subject changed from [crunch-dispatch-local] error running a workflow to [crunch-dispatch-local] [crunch-run] error starting gateway server: missing port in address
Updated by Tom Clegg almost 4 years ago
17398-no-ctr-gateway @ 3193023d7335f793d5cc015aa185f7a450e650f7 -- developer-run-tests: #2333
Updated by Lucas Di Pentima almost 4 years ago
The fix is super trivial, LGTM. I wonder why we didn't catch this with a test, aren't we doing tests with crunch-dispatch-local
(or even crunch-dispatch-slurm
)? Maybe if integration tests are too cumbersome to make, we can make the Gateway start failure a non-critical error?
Updated by Tom Clegg almost 4 years ago
A crunch-run integration test pretty much requires a fully functioning cluster. I'd really like to have a loopback driver for a-d-c so we can get rid of crunch-dispatch-local, and (assuming docker is available) run a container in lib/controller integration tests with federation features and everything.
(Other than this bug) I don't see a reason why we wouldn't be able to start the gateway service so I'm not keen to turn it into a "best effort" thing.
Merging & leaving open, both points seem to deserve some more thought.
Updated by Peter Amstutz almost 4 years ago
- Target version changed from 2021-03-03 sprint to 2021-03-17 sprint
Updated by Peter Amstutz almost 4 years ago
- Status changed from In Progress to Resolved
- Target version changed from 2021-03-17 sprint to 2021-03-03 sprint