Cluster configuration » History » Version 17
Tom Clegg, 01/17/2019 03:39 PM
1 | 1 | Tom Clegg | h1. Cluster configuration |
---|---|---|---|
2 | |||
3 | We are (2018) consolidating configuration from per-microservice yaml/json/ini files into a single cluster configuration document that is used by all components. |
||
4 | * Long term: system nodes automatically keep their configs synchronized (using something like consul). |
||
5 | * Short term: sysadmin uses tools like puppet and terraform to ensure /etc/arvados/config.yml is identical on all system nodes. |
||
6 | * Hosts without config files (e.g., hosts outside the cluster) can retrieve the config document from the API server. |
||
7 | |||
8 | h2. Discovery document |
||
9 | |||
10 | Previously, we copied selected config values from the API server config into the API discovery document so clients could see them. When clients can get the configuration document itself, this won't be needed. The discovery document should advertise APIs provided by the server, not cluster configuration. |
||
11 | |||
12 | 7 | Tom Clegg | h2. Secrets |
13 | |||
14 | Secrets like BlobSigningKey can be given literally in the config file (convenient for dev/test, consul-template, etc) or indirectly using a secret backend. Anticipated backends: |
||
15 | * <code class="yaml">BlobSigningKey: foobar</code> ⇒ the secret is literally <code>foobar</code> |
||
16 | * <code class="yaml">BlobSigningKey: "vault:foobar"</code> ⇒ the secret can be obtained from vault using the vault key "foobar" |
||
17 | * <code class="yaml">BlobSigningKey: "file:/foobar"</code> ⇒ the secret can be read from the local file @/foobar@ |
||
18 | * <code class="yaml">BlobSigningKey: "env:FOOBAR"</code> ⇒ the secret can be read from the environment variable @FOOBAR@ |
||
19 | |||
20 | 1 | Tom Clegg | h2. Example config file |
21 | |||
22 | (Format not yet frozen!) |
||
23 | |||
24 | <pre><code class="yaml"> |
||
25 | Clusters: |
||
26 | xyzzy: |
||
27 | 16 | Tom Clegg | ManagementToken: eec1999ccb6d75840a2c09bc70b6d3cbc990744e |
28 | 1 | Tom Clegg | BlobSigningKey: ungu355able |
29 | BlobSignatureTTL: 172800 |
||
30 | 6 | Tom Clegg | SessionKey: 186005aa54cab1ca95a3738e6e954e0a35a96d3d13a8ea541f4156e8d067b4f3 |
31 | 4 | Tom Clegg | PostgreSQL: |
32 | 11 | Tom Clegg | ConnectionPool: 32 # max concurrent connections per arvados server daemon |
33 | 10 | Tom Clegg | Connection: |
34 | # All parameters here are passed to the PG client library in a connection string; |
||
35 | # see https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS |
||
36 | Host: localhost |
||
37 | Port: 5432 |
||
38 | User: arvados |
||
39 | Password: s3cr3t |
||
40 | DBName: arvados_production |
||
41 | client_encoding: utf8 |
||
42 | fallback_application_name: arvados |
||
43 | 4 | Tom Clegg | HTTPRequestTimeout: 5m |
44 | 6 | Tom Clegg | Defaults: |
45 | CollectionReplication: 2 |
||
46 | TrashLifetime: 2w |
||
47 | UserActivation: |
||
48 | ActivateNewUsers: true |
||
49 | AutoAdminUser: root@example.com |
||
50 | UserProfileNotificationAddress: notify@example.com |
||
51 | 8 | Tom Clegg | NewUserNotificationRecipients: {} |
52 | NewInactiveUserNotificationRecipients: {} |
||
53 | 15 | Tom Clegg | RequestLimits: |
54 | 6 | Tom Clegg | MaxRequestLogParamsSize: 2KB |
55 | MaxRequestSize: 128MiB |
||
56 | MaxIndexDatabaseRead: 128MiB |
||
57 | 1 | Tom Clegg | MaxItemsPerResponse: 1000 |
58 | 15 | Tom Clegg | MultiClusterRequestConcurrency: 4 |
59 | 14 | Tom Clegg | LogLevel: info |
60 | CloudVMs: |
||
61 | 17 | Tom Clegg | BootProbeCommand: "docker ps -q" |
62 | SSHPort: 22 |
||
63 | SyncInterval: 1m # how often to get list of active instances from cloud provider |
||
64 | TimeoutIdle: 1m # shutdown if idle longer than this |
||
65 | TimeoutBooting: 10m # shutdown if exists longer than this without running BootProbeCommand successfully |
||
66 | TimeoutProbe: 2m # shutdown if (after booting) communication fails longer than this, even if ctrs are running |
||
67 | TimeoutShutdown: 1m # shutdown again if node still exists this long after shutdown |
||
68 | 1 | Tom Clegg | Driver: Amazon |
69 | 14 | Tom Clegg | DriverParameters: |
70 | Region: us-east-1 |
||
71 | APITimeout: 20s |
||
72 | 17 | Tom Clegg | AWSAccessKeyID: abcdef |
73 | AWSSecretAccessKey: abcdefghijklmnopqrstuvwxyz |
||
74 | 14 | Tom Clegg | ImageID: ami-0a01b48b88d14541e |
75 | SubnetID: subnet-24f5ae62 |
||
76 | SecurityGroups: sg-3ec53e2a |
||
77 | 13 | Lucas Di Pentima | AuditLogs: |
78 | MaxAge: 2w |
||
79 | 6 | Tom Clegg | DeleteBatchSize: 100000 |
80 | UnloggedAttributes: {} # example: {"manifest_text": true} |
||
81 | ContainerLogStream: |
||
82 | 8 | Tom Clegg | BatchSize: 4KiB |
83 | 6 | Tom Clegg | BatchTime: 1s |
84 | ThrottlePeriod: 1m |
||
85 | ThrottleThresholdSize: 64KiB |
||
86 | ThrottleThresholdLines: 1024 |
||
87 | TruncateSize: 64MiB |
||
88 | PartialLineThrottlePeriod: 5s |
||
89 | Timers: |
||
90 | TrashSweepInterval: 60s |
||
91 | 14 | Tom Clegg | ContainerDispatchPollInterval: 10s |
92 | APIRequestTimeout: 20s |
||
93 | 6 | Tom Clegg | Scaling: |
94 | MaxComputeNodes: 64 |
||
95 | EnablePreemptibleInstances: false |
||
96 | 8 | Tom Clegg | DisableAPIMethods: {} # example: {"jobs.create": true} |
97 | DockerImageFormats: {"v2": true} |
||
98 | 6 | Tom Clegg | Crunch1: |
99 | Enable: true |
||
100 | CrunchJobWrapper: none |
||
101 | CrunchJobUser: crunch |
||
102 | 12 | Tom Clegg | CrunchRefreshTrigger: /tmp/crunch_refresh_trigger |
103 | 6 | Tom Clegg | DefaultDockerImage: false |
104 | 4 | Tom Clegg | NodeProfiles: |
105 | # Key is a profile name; can be specified on service prog command line, defaults to $(hostname) |
||
106 | keep: |
||
107 | # Don’t run other services automatically -- only specified ones |
||
108 | Default: {Disable: true} |
||
109 | Keepstore: {Listen: ":25107"} |
||
110 | apiserver: |
||
111 | Default: {Disable: true} |
||
112 | RailsAPI: {Listen: ":9000", TLS: true} |
||
113 | Controller: {Listen: ":9100"} |
||
114 | 1 | Tom Clegg | Websocket: {Listen: ":9101"} |
115 | Health: {Listen: ":9199"} |
||
116 | keep: |
||
117 | Default: {Disable: true} |
||
118 | KeepProxy: {Listen: ":9102"} |
||
119 | KeepWeb: {Listen: ":9103"} |
||
120 | *: |
||
121 | # This section used for a node whose profile name is not listed above |
||
122 | 13 | Lucas Di Pentima | Default: {Disable: false} # (this is the default behavior) |
123 | Volumes: |
||
124 | xyzzy-keep-0: |
||
125 | Type: s3 |
||
126 | Region: us-east |
||
127 | Bucket: xyzzy-keep-0 |
||
128 | # [rest of keepstore volume config goes here] |
||
129 | 4 | Tom Clegg | WebRoutes: |
130 | 5 | Tom Clegg | # “default” means route according to method/host/path (e.g., if host is a login shell, route there) |
131 | 4 | Tom Clegg | xyzzy.arvadosapi.com: default |
132 | # “collections” means always route to keep-web |
||
133 | collections.xyzzy.arvadosapi.com: collections |
||
134 | # leading * is a wildcard (longest match wins) |
||
135 | "*--collections.xyzzy.arvadosapi.com": collections |
||
136 | cloud.curoverse.com: workbench |
||
137 | workbench.xyzzy.arvadosapi.com: workbench |
||
138 | "*.xyzzy.arvadosapi.com": default |
||
139 | 3 | Tom Clegg | InstanceTypes: |
140 | 8 | Tom Clegg | m4.large: |
141 | VCPUs: 2 |
||
142 | RAM: 8000000000 |
||
143 | Scratch: 31000000000 |
||
144 | Price: 0.1 |
||
145 | m4.large-1t: |
||
146 | # same instance type as m4.large but our scripts attach more scratch |
||
147 | ProviderType: m4.large |
||
148 | VCPUs: 2 |
||
149 | RAM: 8000000000 |
||
150 | Scratch: 999000000000 |
||
151 | Price: 0.12 |
||
152 | m4.xlarge: |
||
153 | VCPUs: 4 |
||
154 | RAM: 16000000000 |
||
155 | Scratch: 78000000000 |
||
156 | Price: 0.2 |
||
157 | m4.8xlarge: |
||
158 | VCPUs: 40 |
||
159 | RAM: 160000000000 |
||
160 | Scratch: 156000000000 |
||
161 | Price: 2 |
||
162 | m4.16xlarge: |
||
163 | VCPUs: 64 |
||
164 | RAM: 256000000000 |
||
165 | Scratch: 310000000000 |
||
166 | Price: 3.2 |
||
167 | c4.large: |
||
168 | VCPUs: 2 |
||
169 | RAM: 3750000000 |
||
170 | Price: 0.1 |
||
171 | c4.8xlarge: |
||
172 | VCPUs: 36 |
||
173 | RAM: 60000000000 |
||
174 | Price: 1.591 |
||
175 | 9 | Tom Clegg | RemoteClusters: |
176 | xrrrr: |
||
177 | Host: xrrrr.arvadosapi.com |
||
178 | Proxy: true # proxy requests to xrrrr on behalf of our clients |
||
179 | AuthProvider: true # users authenticated by xrrrr can use our cluster |
||
180 | 1 | Tom Clegg | </code></pre> |