Idea #22859: Support service containers on a single host - Arvados

Actions

Copy link

Idea #22859

closed

Support service containers on a single host

Added by Peter Amstutz 11 months ago. Updated 9 months ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Tom Clegg

Category:

API

Target version:

Development 2025-07-09

Start date:

06/25/2025

Due date:

Story points:

Release:

Arvados 3.2.0

Release relationship:

Auto

Description

Challenge: we want to support the service container/published ported feature when using Tailscale. Tailscale has a nifty feature where it will request a TLS certificate for you, but only for a single hostname. As a result, we cannot use our virtual-host based approach to reverse-proxy service containers.

Proposed strategy¶

The proposed alternative is to reserve a range of ports on the host for service containers.

When a service container declares a published port, controller establishes a mapping between one of the reserved ports and the backend container. Requests made to a particular port are routed to a particular container.

Proposed implementation¶

Configuration file gets a new option Containers.ServicePortRange which is empty by default but defines range of port numbers (e.g. "4100-4200" or [4100, 4200])

Nginx is configured to reverse-proxy that range of ports to controller.

Controller listens on all those ports (and/or nginx routes them all to controller with an appropriate header that indicates what TCP port the request actually came in on).

Controller checks the inbound port. If the inbound port is in ServicePortRange, it consults the database.

The mapping of containers to ports is stored in the database. Add a new table called something like web_service_ports with columns container_uuid, container_port, and external_port. Use a unique index on the external_port column to ensure routing is unambiguous. When allocating an external port (inserting a row into the table), lock the whole table ("in exclusive mode") to prevent races that could otherwise cause the whole container update transaction to abort.

It is the responsibility of the container state change hooks to allocate and release ports in the database. Port mappings are only valid while a container is in "Running" state.

The external port where a client can contact the container needs to be included to the container's published_ports. I think we want to add a base_url field which has the correct base URL to contact the container depending on whether it is using port mapping or virtual hosts. A starting_url field which is the base URL joined with initial_path would also be quite useful -- in addition to making life incrementally easier for Workbench, Controller needs this information for the "redirect to front page" behavior described below.

When performing the reverse proxy, set a cookie that has the uuid of the intended container, then check the cookie on each request. If the cookie does not match the mapped container, we respond with 301 redirect to the front page of the new app that includes a Clear-Site-Data response header to delete all cookies and local/session storage. (Note: the redirect might have to include the API token in order to re-set the API token cookie).

Tradeoffs¶

This is limited to a finite number of concurrent services. The single hostname install is aimed at single user/small scale installations where this is an acceptable tradeoff, but it is also true that even with a limit of 100 or 1000 concurrent services that is likely enough to satisfy most the needs of even a moderately large multi-user cluster.

A bigger problem is that cookies and local storage can leak across sessions. If I run a container service, stop it, run a different container, and it appears on the same port, the 2nd web app will have access to all the stateful storage from the previous service. The risk ranges from nothing happening at all, to web apps behaving strangely, to leaking sensitive information.

I think we could do something like: set a cookie with uuid of the intended container, then check the cookie on each request. If the cookie has changed, we respond with 301 redirect to the front page of the new app that includes a Clear-Site-Data response header to delete all cookies and local/session storage.

Subtasks 2 (0 open — 2 closed)

Related issues 5 (3 open — 2 closed)

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Position changed from -862704 to -862698

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Tracker changed from Feature to Idea

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Related to Idea #22427: Accessing Arvados services over Tailscale added

Actions

Copy link

Updated by Tom Clegg 11 months ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#10

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#11

Updated by Tom Clegg 11 months ago

Description updated (diff)

Actions

Copy link

#12

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#13

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#14

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#15

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#16

Updated by Peter Amstutz 11 months ago

Description updated (diff)

Actions

Copy link

#17

Updated by Peter Amstutz 11 months ago

Target version changed from Development 2025-05-28 to Development 2025-06-25

Actions

Copy link

#18

Updated by Peter Amstutz 10 months ago

Assigned To set to Tom Clegg

Actions

Copy link

#19

Updated by Peter Amstutz 10 months ago

Subtask #22950 added

Actions

Copy link

#20

Updated by Tom Clegg 10 months ago

Status changed from New to In Progress

Actions

Copy link

#21

Updated by Peter Amstutz 10 months ago

Release set to 79

Actions

Copy link

#22

Updated by Tom Clegg 10 months ago

22859-service-port-range @ aa0695e259f1487330832c85c83a3375e1779f35 -- developer-run-tests: #4809

retry remainder run-tests-remainder: #5295

Actions

Copy link

#23

Updated by Tom Clegg 10 months ago

Related to Feature #22861: Cache ContainerHTTPProxy port names, access levels, and permissions added

Actions

Copy link

#24

Updated by Tom Clegg 10 months ago

Related to Idea #17207: services running in containers added

Actions

Copy link

#25

Updated by Tom Clegg 10 months ago

All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅ Add config entry for port range
  - ✨ Instead of Containers.ServicePortRange I made the config Services.ContainerWebServices.ExternalPortMin and ...Max
  - Arrays in config are discouraged, and Min/Max is more explicit
  - Having the configs right next to the URL they modify seems easier to manage
  - I did consider encoding it as "https://example.com:2000-3000/" (but Ruby can't load a config with that in it) or "https://example.com/#2000-3000" (but that's not better than having separate min/max knobs)
- ✅ Dynamically assign ports (one per published_port entry) from the available range while the container is running, release when container is no longer running
- ✅ Update the container's published_ports map when state changes to Running
  - add {"external_port": 1234} if a dynamic port is assigned
  - add {"base_url": "https://..."} and {"initial_url": "https://.../initial_path"} if the port will be reachable, i.e., a dynamic port is assigned or config enables wildcard behavior (so Workbench can use these values blindly without having to implement any config logic, append initial_path, etc.)
- ✅ Route incoming requests on dynamic ports according to entries in container_ports
Anything not implemented (discovered or discussed during work) has a follow-up story.
- ❌ TODO: Clear-Site-Data will need a follow-up branch.
Code is tested and passing, both automated and manual, what manual testing was done is described.
- ✅ Automated tests added.
New or changed UI/UX has gotten feedback from stakeholders.
- n/a
Documentation has been updated.
- ✅ Document nginx config (are there more places this deserves to be mentioned? is a comment in the template code good enough?)
- ❌ TODO: the added fields in the published_ports map should be documented on the containers API page
Behaves appropriately at the intended scale (describe intended scale).
- ✅ (Still) intended to support low volume of requests, see Feature #22861: Cache ContainerHTTPProxy port names, access levels, and permissions
Considered backwards and forwards compatibility issues between client and server.
- ✅ n/a
Follows our coding standards and GUI style guidelines.
- ✅

"listen 1000-2000;" works in Nginx 1.22.1, but the Nginx docs for the listen directive don't mention it at all, let alone which version started supporting it. Git history indicates support was added in 1.15.10, March 2019. The Nginx docs for the listen directive in a stream section do mention port range support starting in 1.15.10. So it seems this is just an omission in the Nginx documentation.

Actions

Copy link

#26

Updated by Brett Smith 9 months ago

Tom Clegg wrote in #note-22:

22859-service-port-range @ aa0695e259f1487330832c85c83a3375e1779f35 -- developer-run-tests: #4809

Just one documentation nit/question. In this comment in the example config:

## Depending on the number of ports in the range, you might also
## need to increase worker_connections in the events section of your
## Nginx configuration:

As an ops person, this is vague and hard to write configuration from. What relationship must hold true between the number of listening ports and worker_connections? What other factors should be considered? If you want to have a conversation about what's known and what's unknown, I would be happy to do that and see what we can do to improve this.

I'm thinking ahead a little bit to how we automate this in the Ansible installer. Even if we let the admin override the value, we're at least going to want a baseline algorithm to set worker_connections from the port range in configuration. I'm wondering what that algorithm should be. The comment doesn't give me much guidance.

Thanks.

Actions

Copy link

#27

Updated by Tom Clegg 9 months ago

Experimentally, worker_connections >= number of listen ports is required.

Reading https://nginx.org/en/docs/ngx_core_module.html#worker_connections and a few opinions elsewhere on the internet, I found

nobody has characterized this well enough to offer a formula
worker_connections = 512 is the default and that's too low for production
worker_connections = (number of listen ports) + (concurrent clients) might be enough
if worker_connections is too low for the number of concurrent clients, the clients queue up, so it's a performance issue but doesn't crash or reject requests
worker_rlimit_nofile >= worker_connections is advised
worker_rlimit_nofile >= (number of listen ports) + (concurrent clients * 3) is probably better -- 3 because each connection can potentially use a downstream socket, an upstream socket, and a cache file
the impact of oversizing these configs is greater potential DOS impact (nginx doesn't consume more resources just because the configs are high)

So maybe (number of listen ports) + (expected number of clients) is a good value for worker_connections?

And maybe (worker_connections * 3 + 1024), is a good value for worker_rlimit_nofile, to account for other FDs for log files etc.? Maybe double that because running out of open FDs never seems to be a good indication of a real resource constraint?

Actions

Copy link

#28

Updated by Tom Clegg 9 months ago

(following discussion) I've added the short version of this, "add the number of additional ports to worker_connections and rlimit_nofile", to the docs.

Hopefully this gets the point across without providing underinformed/misleading Nginx-tuning recommendations.

22859-service-port-range @ e96d02ff56c24d23598cad6e669ecd9293b5d07b

Actions

Copy link

#29

Updated by Brett Smith 9 months ago

Tom Clegg wrote in #note-28:

22859-service-port-range @ e96d02ff56c24d23598cad6e669ecd9293b5d07b

Readability nit: it would be nicer if the example port range was 2000-2999, so then later we're just adding 1000 to worker_connections and worker_rlimit_nofile and the math is easier to follow.

Please make that change, or don't if it offends you, and merge. Thanks.

Actions

Copy link

#30

Updated by Brett Smith 9 months ago

Target version changed from Development 2025-06-25 to Development 2025-07-09

Actions

Copy link

#31

Updated by Brett Smith 9 months ago

Subtask #22993 added

Actions

Copy link

#32

Updated by Tom Clegg 9 months ago

Unfortunately, the Clear-Site-Data browser feature specifies that it is only capable of clearing cookies for the entire registered domain. If we serve Clear-Site-Data: "cookies" from https://machinename.tailnet-name.ts.net:2000/, then the browser will clear cookies for https://machine.tailnet-name.ts.net:* and https://*.tailnet-name.ts.net:*. Especially given that the primary motivation for this whole feature is that a tailscale-like service only allows a single hostname per host, this seems fatally unusable.

We can still use Clear-Site-Data to clear "cache" and "storage" (which apply only to the origin, not the entire registered domain), and use the regular Set-Cookie header to clear whatever cookies were sent in the request. This isn't complete in that some cookies won't be sent because their path attribute doesn't match the current path. But I expect it would cover most real-world situations.

Actions

Copy link

#33

Updated by Brett Smith 9 months ago

Tom Clegg wrote in #note-32:

We can still use Clear-Site-Data to clear "cache" and "storage" (which apply only to the origin, not the entire registered domain), and use the regular Set-Cookie header to clear whatever cookies were sent in the request. This isn't complete in that some cookies won't be sent because their path attribute doesn't match the current path. But I expect it would cover most real-world situations.

I agree something in this vein seems like our best option. Could we add an intermediate redirect to a dedicated "clear cookies" URL with a path of / to try to catch all the cookies? And then from there redirect to the new app?

Actions

Copy link

#34

Updated by Tom Clegg 9 months ago

Brett Smith wrote in #note-33:

Could we add an intermediate redirect to a dedicated "clear cookies" URL with a path of / to try to catch all the cookies? And then from there redirect to the new app?

AFAIK a cookie sent to "/" must have path=/ or no path attribute at all, and is therefore also sent to all other URLs, so I don't think we can catch anything more this way.

To get another idea out of the way: in principle we could look for Set-Cookie headers in all proxied responses and save the list of cookie paths/names somewhere, so later when reusing the port we can clear them all. This seems like a lot of effort and overhead, and still wouldn't catch everything, because cookies can also be set by client-side scripts.

Actions

Copy link

#35

Updated by Brett Smith 9 months ago

Got it, thanks. Yeah I agree I can't think of anything better than your plan then, go ahead.

Actions

Copy link

#36

Updated by Tom Clegg 9 months ago

22859-clear-site-data @ 439c1c0f3931007d6e2cacbed9b37be828239bee -- developer-run-tests: #4828

sdk/cwl fail
cache fail, retry remainder run-tests-remainder: #5354

Actions

Copy link

#37

Updated by Tom Clegg 9 months ago

22859-clear-site-data @ b1322d8c968d1ab21edc3385279dba9e7b2ada53 -- developer-run-tests: #4829

cache.arvados.org timeout, retry remainder run-tests-remainder: #5356

All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅ Set arvados_container_uuid cookie so we can detect when the client might have state leftover from previous container
- ✅ When we detect that, send a Clear-Site-Data header, and explicitly expire all cookies the client sent (we can't use Clear-Site-Data for the cookie part, see #note-32)
Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
Code is tested and passing, both automated and manual, what manual testing was done is described.
- ✅ Automated unit tests, plus curl-based tests to confirm (at least one) client follows the clear-cookies-and-redirect process the way we expect
New or changed UI/UX has gotten feedback from stakeholders.
- n/a
Documentation has been updated.
- ❌ still TODO: added fields in the published_ports map should be documented on the containers API page
Behaves appropriately at the intended scale (describe intended scale).
- n/a
Considered backwards and forwards compatibility issues between client and server.
- n/a
Follows our coding standards and GUI style guidelines.
- ✅

One wrinkle made the code a bit more complicated than I hoped. Sometimes we need to redirect to move a query token to a cookie to avoid showing it in the browser's Location bar, even if we're returning a lookup/authentication error. Previously, we just did the query-to-cookie redirect before the lookup/authentication code. But if we do that, and there isn't a lookup/authentication error, then we might do another redirect to clear site data. To avoid the double-redirect, I rearranged the code to defer the query-to-cookie redirect decision until (a) we discover we can't complete the request or (b) we decide whether we need a site-data redirect.

Actions

Copy link

#38

Updated by Brett Smith 9 months ago

Tom Clegg wrote in #note-37:

22859-clear-site-data @ b1322d8c968d1ab21edc3385279dba9e7b2ada53 -- developer-run-tests: #4829

This is good to merge. I have two readability wishlist items but I would rather see the branch merged in a timely manner than see these addressed.

As before, personally I think maybeRedirect would be nicer defined on its own than as a closure. Yes, even given the number of arguments it'll need to take.

The curl tests make me sad. The main reason is that the output is completely unstructured, so all our assertions are just loose regexps tests against a big block of text. These are painful to debug when they fail and prone to false positives. On reading, it's not clear to me what we're getting from using curl instead of Go's HTTP library. I'm guessing there's a good reason though, so then my next wondering is whether we could run the output through an HTTP response parser and get structured data back.

The tiniest thing, I think the cookie.jar file would be better named cookies.txt, partly because that's the traditional name and partly because it's not a Java JAR.

Thanks.

Actions

Copy link

#39

Updated by Tom Clegg 9 months ago

The reasons for the curl tests are that it was easy to write (which is true partly because we have other tests that use curl for the sake of interoperability assurance) and it was easy to debug without writing a lot of assertions (if the expected thing doesn't happen, the log is there with all headers sent/received, redirect target, etc).

That said, I do agree it would be better to use the built-in Go client. It has the cookie and redirect functionality we need, and I don't think there's any reason why it's less effective than curl when it comes to establishing that real-world clients will interpret our cookies and redirects the way we expect.

For schedule's sake maybe I should merge this and follow up with that change?

Meanwhile, I've addressed the other two points:

maybeRedirect is still a closure, but now it's tiny, and the large chunk that actually implements the redirect is a separate func
cookies.txt

22859-clear-site-data @ 512b780b7c254ea0ca359180fdf12f821db6f58a -- developer-run-tests: #4830

Actions

Copy link

#40

Updated by Brett Smith 9 months ago

Tom Clegg wrote in #note-39:

For schedule's sake maybe I should merge this and follow up with that change?

Yes, please go ahead with that.

maybeRedirect is still a closure, but now it's tiny, and the large chunk that actually implements the redirect is a separate func

Yes, this is still a huge improvement and works for me, thank you for that.

Actions

Copy link

#41

Updated by Tom Clegg 9 months ago

Related to Bug #23023: Update API docs to mention published_ports[external_port], [base_url], and [initial_path] added

Actions

Copy link

#42

Updated by Tom Clegg 9 months ago

Related to Bug #23025: Fix ContainerHTTPProxy ReusedPort integration tests to use builtin go http client instead of curl added

Actions

Copy link

#43

Updated by Tom Clegg 9 months ago

Merged.

Added remaining todo's as separate tickets

Actions

Copy link

#44

Updated by Tom Clegg 9 months ago

Status changed from In Progress to Resolved

Actions

Copy link

Also available in: Atom PDF

Related to Arvados - Idea #22427: Accessing Arvados services over Tailscale	New				Actions
Related to Arvados - Feature #22861: Cache ContainerHTTPProxy port names, access levels, and permissions	New				Actions
Related to Arvados Epics - Idea #17207: services running in containers	In Progress		03/01/2025	08/31/2025	Actions
Related to Arvados - Bug #23023: Update API docs to mention published_ports[external_port], [base_url], and [initial_path]	Resolved	Tom Clegg			Actions
Related to Arvados - Bug #23025: Fix ContainerHTTPProxy ReusedPort integration tests to use builtin go http client instead of curl	Resolved	Tom Clegg			Actions

Project

General

Profile

Arvados

Custom queries

Idea #22859

Support service containers on a single host

Proposed strategy¶

Proposed implementation¶

Tradeoffs¶

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Tom Clegg 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Tom Clegg 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 11 months ago

Updated by Peter Amstutz 10 months ago

Updated by Peter Amstutz 10 months ago

Updated by Tom Clegg 10 months ago

Updated by Peter Amstutz 10 months ago

Updated by Tom Clegg 10 months ago

Updated by Tom Clegg 10 months ago

Updated by Tom Clegg 10 months ago

Updated by Tom Clegg 10 months ago

Updated by Brett Smith 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Brett Smith 9 months ago

Updated by Brett Smith 9 months ago

Updated by Brett Smith 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Brett Smith 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Brett Smith 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Brett Smith 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Brett Smith 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Tom Clegg 9 months ago

Updated by Tom Clegg 9 months ago