Websocket server » History » Revision 9
Revision 8 (Tom Clegg, 10/25/2016 06:03 PM) → Revision 9/12 (Tom Clegg, 10/25/2016 07:11 PM)
h1. Websocket server
(draft)
{{toc}}
See also: [[Events API]]
h1. Messages
Each message is JSON-encoded as an object with exactly one key. The key indicates the message type, and the value contains the message content.
This allows clients and servers to decode messages efficiently: decode the first token to determine the message type, then (if the message content is relevant) decode the message payload into an appropriate data structure.
<pre><code class="javascript">
good: {"error":{"code":418,"text":"I'm a teapot"}}
bad: {"errorCode":418,"errorText":"I'm a teapot"}
</code></pre>
Clients must ignore any unrecognized keys they encounter in the payload. This allows the server to add features without breaking existing clients.
h2. setAuth
After establishing a connection, and before subscribing to any streams, the client must supply an authorization token.
Successful authorization is acknowledged.
<pre><code class="javascript">
client: {
"setAuth":{"token":"3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi"}
}
server: {
"auth":{"uuid":"zzzzz-gj3su-077z32aux8dg2s1"}
}
</code></pre>
Unsuccessful authorization results in an error.
<pre><code class="javascript">
client: {
"setAuth":{
"token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}
server: {
"authError":{
"errorText":"invalid or expired token"}}
</code></pre>
h2. subscribe
Subscribe to an event stream.
If the given ETag does not match the current ETag, the server should send an update event right away: this means the client has already missed one or more updates since the version it has cached.
<pre><code class="javascript">
client: {
"subscribe":{
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"9u32836jpz7i046sd84gu190h"}}
server: {
"event":{
"msgID":12345,
"type":"update",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
</code></pre>
When a client subscribes to a stream X, but is not authorized to read the object with UUID X (or there is no such object), the server sends an error message. This does not terminate the connection, nor does it affect any other streams.
<pre><code class="javascript">
client: {
"subscribe":{
"uuid":"zzzzz-tpzed-000000000000000",
"etag":"x"}}
server: {
"subscribeError":{
"uuid":"zzzzz-tpzed-000000000000000",
"errorText":"forbidden"}}
</code></pre>
h2. Container and job logging events
[[Events API]] → "Non-state-changing events"
<pre><code class="javascript">
client: {
"subscribe":{
"uuid":"zzzzz-dz642-logscontainer03",
"etag":"2qtm62j6zb3nx5zud8b5v0ayl",
"select":["logs.event_type","logs.properties.text"]}}
server: {
"event":{
"msgID":12346,
"type":"log",
"uuid":"zzzzz-dz642-logscontainer03",
"etag":"2qtm62j6zb3nx5zud8b5v0ayl",
"log":{
"event_type":"stderr",
"properties":{
"text":"foo\n"}}}}
</code></pre>
h2. Update events
<pre><code class="javascript">
server: {
"event":{
"msgID":12345,
"type":"update",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
</code></pre>
h2. Create events
<pre><code class="javascript">
server: {
"event":{
"msgID":12345,
"type":"create",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
</code></pre>
h2. Delete events
<pre><code class="javascript">
server: {
"event":{
"msgID":12345,
"type":"delete",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
</code></pre>
The etag reflects the last state of the object before it was deleted.
TBD: Should the etag be omitted instead?
Note: The logs table (and the old websocket API) use(d) a different event type: "destroy".
h2. Missed events
Zero or more events for a single stream have been skipped:
<pre><code class="javascript">
server: {
"eventsMissed":{
"msgID":12347,
"uuid":"zzzzz-dz642-logscontainer03"}}
</code></pre>
Zero or more events on one or more of the subscribed streams have been skipped:
<pre><code class="javascript">
server: {
"eventsMissed":{
"msgID":12348}}
</code></pre>
h1. Server implementation
h2. Architecture
Go server with a goroutine serving each connection.
One goroutine receives incoming events and assigns msgID numbers.
Each connection has an outgoing event queue. Leave room for ability to resize a connection's outgoing queue dynamically, provided no subscriptions are active: this way privileged clients can request bigger queues.
Common events should be serialized once and distributed to all connections. This avoids serializing each event N times, and allows outgoing queues to share a single message buffer for a given event.
If practical, when a connection's outgoing queue fills up, send a "missed events" signal and discard all buffered events (and, of course, any incoming events that arrive while the buffer is full). After a "missed events" signal the client needs to assume its cache is out of date anyway. Expect a faster recovery from a temporary backlog if, when skipping events, we skip as many as we can.
h2. Logging
Print JSON-formatted log entries on stderr.
Print a log entry when a client connects.
Print a log entry when a client disconnects. Show counters for:
* Number of streams (UUIDs) added while connection was up
* Number of streams removed
* Number of events sent
* Number of bytes sent
* Total time spent waiting for Write() to return (or a better way to measure congestion?)
h2. Libraries
Websocket:
* https://godoc.org/golang.org/x/net/websocket
PostgreSQL:
* https://godoc.org/github.com/lib/pq via https://godoc.org/database/sql
* https://godoc.org/github.com/lib/pq#hdr-Notifications and https://godoc.org/github.com/lib/pq/listen_example
h1. Problems with old/current implementation
(Lessons to avoid re-learning next time...)
The Rails API server can function as a websocket server. Clients (notably Workbench, arv-mount, arv-ws) use it to listen for events without polling.
Problems with current implementation:
* Unreliable. See #9427, #8277
* Resource-heavy (one postgres connection per connected client, uses lots of memory)
* Logging is not very good
* Updates look like database records instead of API responses (e.g., computed fields are missing, collection manifest_text has no signatures)
* Offers an API for catching up on missed events after disconnecting/reconnecting, but this API (let alone the code) isn't enough to offer a "don't miss any events, don't send any events twice" guarantee. See #9388
#8460