Migrating from arvados-node-manager to arvados-dispatch-cloud » History » Revision 5
Revision 4 (Tom Clegg, 02/11/2019 07:47 PM) → Revision 5/22 (Tom Clegg, 02/12/2019 02:39 PM)
h1. Migrating from arvados-node-manager to arvados-dispatch-cloud crunch-dispatch-cloud {{toc}} h2. Choose a node The dispatch service can run on any host that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs. In the following example it runs on the same node as the API server and controller. h2. Prepare key pair and worker VM image Generate an SSH key pair. Save the public key in @/root/.ssh/authorized_keys@ in the worker VM image. Save the private key in the cluster configuration file (see @PrivateKey@ in the example below). h2. Update cluster configuration file In @/etc/arvados/config.yml@, add configuration items for the dispatch service. <pre><code class="yaml"> Clusters: uuid_prefix: CloudVMs: BootProbeCommand: "mount | grep /mnt/scratch" SSHPort: "2222" SyncInterval: 1m TimeoutIdle: 2m TimeoutBooting: 10m TimeoutProbe: 5m TimeoutShutdown: 30s ImageID: "image-12345678" Driver: Azure DriverParameters: SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX subscription_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745 ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745 (same value as ClientID) ClientSecret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # not needed after #14745 (same value as ClientSecret) TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX # not needed after #14745 CloudEnv: AzurePublicCloud cloud_environment: AzurePublicCloud # not needed after #14745 ResourceGroup: zzzzz resource_group: zzzzz Location: centralus region: centralus # not needed after #14745 (same value as Location) Network: zzzzz Subnet: zzzzz-subnet-private StorageAccount: example storage_account: example # not needed after #14745 BlobContainer: vhds blob_container: vhds # not needed after #14745 DeleteDanglingResourcesAfter: 20 delete_dangling_resources_after: 20 # not needed after #14745 Dispatch: PrivateKey: | -----BEGIN RSA PRIVATE KEY----- MIIEowIBAAKCAQEAqYm4XsQHm8sBSZFwUX5VeW1OkGsfoNzcGPG2nzzYRhNhClYZ 0ABHhUk82HkaC/8l6d/jpYTf42HrK42nNQ0r0Yzs7qw8yZMQioK4Yk+kFyVLF78E GRG4pGAWXFs6pUchs/lm8fo9zcda4R3XeqgI+NO+nEERXmdRJa1FhI+Za3/S/+CV mg+6O00wZz2+vKmDPptGN4MCKmQOCKsMJts7wSZGyVcTtdNv7jjfr6yPAIOIL8X7 ... JIBvlVfcHb1IHMA9YG7ZQjrMRmx2Xj3ce4RVPgUGHh8ra7gvLjd72/Tpf0doNClN ti/hAoGBAMW5D3LhU05LXWmOqpeT4VDgqk4MrTBcstVe7KdVjwzHrVHCAmI927vI pjpphWzpC9m3x4OsTNf8m+g6H7f3IiQS0aiFNtduXYlcuT5FHS2fSATTzg5PBon9 1E6BudOve+WyFyBs7hFWAqWFBdWujAl4Qk5Ek09U2ilFEPE7RTgJ -----END RSA PRIVATE KEY----- StaleLockTimeout: 1m PollInterval: 10s ProbeInterval: 10s MaxProbesPerSecond: 10 InstanceTypes: x1lg: ProviderType: x1.large VCPUs: 16 RAM: 128G Scratch: 128G Price: 1.23 ManagementToken: "example-secret-management-token" NodeProfiles: apiserver: # references ARVADOS_NODE_PROFILE in environment file (see below). arvados-dispatch-cloud: Listen: ":9005" </code></pre> Create the host configuration file @/etc/arvados/environment@. <pre> ARVADOS_NODE_PROFILE=apiserver </pre> h2. Stop crunch-dispatch-slurm Stop and disable the crunch-dispatch-slurm service, and uninstall the package to make sure it doesn't start after the next reboot/upgrade. <pre> # systemctl stop crunch-dispatch-slurm # systemctl disable crunch-dispatch-slurm # apt-get remove crunch-dispatch-slurm </pre> Containers that have already been locked and submitted to SLURM will make their way through the SLURM queue, but newly queued containers will be left for arvados-dispatch-cloud crunch-dispatch-cloud to run. h2. Install arvados-dispatch-cloud crunch-dispatch-cloud <pre> # apt-get install arvados-dispatch-cloud crunch-dispatch-cloud </pre> h2. Verify the service is running <pre> $ token="example-secret-management-token" $ curl -H "Authorization: Bearer $token" http://localhost:9005/metrics </pre> h2. Verify the service is functional Watch the dispatcher's logs while you run an Arvados container: <pre> # journalctl -ocat -fu arvados-dispatch-cloud </pre>