Story #13497
closed[API] Initial "arvados-controller" server that proxies API endpoints to Rails server
Added by Tom Clegg over 6 years ago. Updated about 6 years ago.
100%
Description
Background¶
This is the first step toward retiring the Rails API server (#9053). It unblocks:- implementing new APIs in Go (without increasing the discovery/routing burden in every client/SDK)
- porting individual APIs to Go (without having to update all clients/SDKs each time, or proxy requests through Rails)
- federation routing (#13493)
- refactor existing Go services as packages so they can be used in unit tests
Objective¶
This initial version changes the way requests are routed inside an API server node.- Before: client → Nginx → arvados-api-server
- After: client → Nginx → arvados-controller → arvados-api-server
- Request and response headers are passed through blindly
- All requests are proxied to one single arvados-api-server (Rails) service at the configured address and port (typically localhost:8000)
Requirements¶
Load configuration from the cluster configuration document from #12260. There will be no arvados-controller config file.
Updated by Tom Clegg over 6 years ago
- Blocks Story #9053: Port API server to Go added
Updated by Tom Clegg over 6 years ago
- Blocks Feature #13493: Federated record retrieval added
Updated by Nico César over 6 years ago
API in Go, what a good time to be alive.
My idea of the "after" scenario is the following
client → Nginx → arvados-router → Nginx → arvados-api-server
because [Nginx → arvados-api-server] should be taken as one on how we are using passenger today. the first Nginx is because I like SSL to be handled by Nginx. This can be 1 instance of Nginx, and is all configuration based. I see an easy task for us to integrate all this. But...
let's talk why we need an arvados-router: We want to slowly shadow the legacy application with endpoints meaning:
- Each deploy will have brave-new rules that include routing to the new services in Go and deprecating old API calls
- Initially this will be blank and progressively we'll be adding them. Progressively meaning each new version may or may not have new rules.
- Throughput should be good for our current needs and our future needs while this migration is happening (could be years)
- Debugging of bottlenecks and post-morten logs have to be easy and fast
- Bursty software development will require a lot of chained-changes in the shadowing-rules in few days then priorities can change leaving everythin as-is for months, so the resulting architecture has to be stable.
this can be done in several ways:
- have a file /etc/nginx/conf.d/arvados-router.conf that takes care of them in a "client → Nginx → Nginx → Nginx → arvados-api-server" configuration (I'm repeating here to match my above diagram, In the realistic scenario it's going to be one Nginx.)
- have a http router of some sort not done by us. I can think of Envoy here: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/http_routing that is a good technology for other platforms as kubernetes.
- we develop arvados-router from scratch feeding the configuration from /etc/arvados/router/config.yml, that is updated in every "state of the legacy's shadowing"
Everything here has a trade-off:
- Nginx approach is easy to first-deploy, and we have to come up with a good way to update /etc/nginx/conf.d/arvados-router.conf in each version of the cluster bundle. Very proven technology and we already have it in front of api-server. There is no need of a external TCP off-band-and-back connection, including almost no change in memory pressure.
- Feature list of Envoy is good for any future expansions, but complexity isn't minor
- New development will require effort and time, same as above we have to come up with a good way to update /etc/arvados/router/config.yml in each version of the cluster bundle. Unknown throuput that it can handle (but most likely we'll be ok,) implications and debugging will require an Arvados Engineer most likely.
How is federation routing (#13493) will impact the functionality of this ?
My take is to go with option number 1 (nginx properly configured) and we can talk on /etc/nginx/conf.d/arvados-router.conf creation (it will have a different format as /etc/arvados/router/config.yml but very similar content) I quick idea here is a nginx-arvados-router.deb that only has that file, taken out of the arvados repo, it reloads nginx upon install.
Does it makes sense?
Updated by Nico César over 6 years ago
After a talk with Tom this is a good idea, but the name "-router" is misleading. maybe Tom will come up with a better name. ;)
Updated by Nico César over 6 years ago
For a lack of a better name I'll call it maestro for now (master/teacher in Spanish). here are the ports to use
client → [443/HTTPS]Nginx → [7900/HTTP]arvados-maestro → [8000/HTTP]Nginx → arvados-api-server
the initial phase arvados maestro will be a pass-thru daemon doing no work at all. just to have it in-place and measure
- latency introduced.
- memory usage
- CPU / other resources used.
Initially if this is in the API VM major network issues wont be a problem. As the microservice grows, we'll be moving it to a separte VM. I also want to see in the early stages tests in a separate VM, so we have early detection of problems, Network (latency/outage) being the most notable but some other environmental issues like NTP/DNS etc.
this will have to be deployed in all clusters in this no-op mode, this will require the adaptation of current nginx configurations via puppet. plus the needed package creation of arvados-maestro. I'll do some tickets about this when the time comes.
Updated by Tom Morris over 6 years ago
- Target version set to To Be Groomed
It seems like there must be a number of ready-built options that we could adopt here.
Do we have a list of candidates to evaluate?
Updated by Tom Clegg over 6 years ago
- Subject changed from [API] Initial "arvados-router" server that proxies API endpoints to Rails server to [API] Initial "arvados-controller" server that proxies API endpoints to Rails server
- Description updated (diff)
Renamed from "router" to "controller".
This component is a replacement for the Rails API server.
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 2.0
Updated by Tom Clegg over 6 years ago
- Blocks Story #13574: [Controller] Update container priorities asynchronously added
Updated by Tom Morris over 6 years ago
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2018-06-20 Sprint
Updated by Tom Clegg over 6 years ago
- new package "arvados-server" (currently only has "version" and "controller" subcommands)
- new package "arvados-controller" (same binary as arvados-server, but comes with a systemd unit file, and installs the binary as /usr/bin/arvados-controller)
- run-tests.sh routes integration tests' API traffic to controller (through Nginx+TLS) instead of Rails server
- Outline upgrade/install process -- see Installing controller service
- Review/merge this branch
- Document (here/wiki) how to update a site to use the arvados-controller service
- Refine docs with feedback from ops
- Confirm the service works on some real-life clusters
- Update the upgrade/install docs on doc.arvados.org accordingly
- refuse to start if Rails API port cannot be found in config (currently controller starts up but responds {"errors":["missing port in address"]})
- edit: done in 1b5156270
Updated by Tom Clegg over 6 years ago
- rename SystemNodes to NodeProfiles after discussion with Nico
- add support for ARVADOS_NODE_PROFILE=x in /etc/arvados/environment as a way to select a profile without changing hostname or editing systemd files
Updated by Lucas Di Pentima over 6 years ago
Although this is a fairly large update, I'm not finding any obvious issues so I don't want to block this merge much longer. LGTM, thanks!
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-06-20 Sprint to 2018-07-03 Sprint
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-07-03 Sprint to 2018-07-18 Sprint
Updated by Tom Clegg over 6 years ago
- Fixes broken login/logout by propagating redirect responses back to client instead of following them.
- Preserves original Host header in proxy requests (otherwise Rails uses its internal address like http://localhost:8000/ in redirect targets).
Updated by Lucas Di Pentima over 6 years ago
There're some failing tests at: https://ci.curoverse.com/job/developer-run-tests/800/
I ran services/fuse
tests locally without issues but sdk/python
gave me errors about not finding "controller".
Updated by Tom Clegg over 6 years ago
Turns out there are lots of places where scheme/vhost can get munged by proxies and not properly unmunged after they get used by the upstream server to construct redirect targets...
13497-controller @ f9a05f61abdf33891b09d62205d009d1cae73d1b https://ci.curoverse.com/job/developer-run-tests/807/
Updated by Lucas Di Pentima over 6 years ago
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
Updated by Tom Clegg over 6 years ago
Some haphazardly chosen timing data from 4xphq
request id | API | controller timeTotal (s) | rails duration (ms) | delta (ms) |
req-1fwee7391mf691vhppw7 | GET /arvados/v1/virtual_machines/get_all_logins | 0.082272 | 72.94 | 9.3 |
req-yhp9nhblckp5zb3p5083 | GET /arvados/v1/jobs/queue | 0.180192 | 171.8 | 8.4 |
req-11q47gm1taefs4azwaac | GET /arvados/v1/containers | 0.022545 | 6.99 | 15.6 |
req-hmbw176kftg11mgsg0ex | POST /arvados/v1/collections | 1.242400 | 1233.46 | 8.9 |
Updated by Tom Clegg over 6 years ago
- adds "install controller" to install guide and upgrade notes
Updated by Lucas Di Pentima over 6 years ago
Docs @ c8a4dee5e52feed137ca3cb4c4a4e224efbb694f LGTM, thanks!
Updated by Tom Clegg over 6 years ago
- Status changed from In Progress to Resolved
Updated by Tom Clegg about 6 years ago
- Related to Bug #14383: [API] Java SDK double slash bug with arvados-controller added