Bug #9388
closed[Websockets] events are skipped
100%
Description
I observed websockets failing to send events that it clearly should have sent.
The issue is with how websockets is querying the database. It assumes monotonically increasing "id" fields.
While "id" is monotonically increasing, the "id" seems to be allocated at the start of record creation, not the end. So a log id '2' could show up before log id '1', if '2' commits before '1'.
The proposed fix is to perform an additional query which checks that there are no gaps in the 'id' sequence. If a gap is found, don't process any events after the gap until it is filled.
Updated by Peter Amstutz over 8 years ago
- Category set to API
- Assigned To set to Peter Amstutz
Updated by Peter Amstutz over 8 years ago
- Target version changed from 2016-06-08 sprint to 2016-06-22 sprint
Updated by Tom Clegg over 8 years ago
The after_save → after_commit change LGTM.
The "stall if any id is skipped" part looks like trouble, though.
To handle out-of-order commits without getting stuck, how about this:- If the ID in the NOTIFY payload is smaller than the next ID we expect, get the log entry using "where id = {payload-id}" and don't update last_log_id.
- Otherwise, do what we've been doing, i.e., get a batch of logs using "where id > {last_log_id}"
IOW, last_log_id means "can't get batches of logs with id <= last_log_id, because we've already emitted some of them". We can still emit logs with smaller IDs -- because NOTIFY tells us each ID as it gets committed -- we just have to handle them individually, not in batches.
Updated by Tom Clegg over 8 years ago
Tom Clegg wrote:
- Otherwise, do what we've been doing, i.e., get a batch of logs using "where id > {last_log_id}"
Hm, I think this will cause duplicates: we'll get NOTIFY 1 and (possibly) fetch a batch 1,2,3 ... then we'll get NOTIFY 2, then NOTIFY 3.
Maybe we should just do the simple thing: NOTIFY {id} means emit log {id} to anyone with matching filters.
Updated by Tom Clegg over 8 years ago
Since batching queries does seem worthwhile, perhaps we can do it safely by collecting individual payload ids instead of trying to work with ranges?
batch = [] timer = nil forever { select { notification arrives with log_id: batch.append(log_id) if timer is nil { timer = new timer(some milliseconds) } timer expires: // process a batch of logs query_logs_table("where id in (?)", batch) (apply subscription/permission filters) send messages to client } }
Updated by Tom Clegg over 8 years ago
(Peter pointed out) "publish each event when the notify arrives" doesn't address the "start at event ID X" feature.
I'd argue that feature is fundamentally broken, given what we've discussed here. "The ID of the last log I received is X" is simply not enough information to know whether a log with ID=Y has been received (except the boring case of Y=X, obviously).
Ideas for getting closer to "don't miss any events, but don't get any duplicate events either":- (Server) Subscribe to notifications; fetch and send all existing logs with id>X and remember their IDs; as new notifications arrive, check the IDs against the initial batch to avoid sending duplicates (this avoids sending duplicates in a single connection, but doesn't address the issue of reconnecting without skips/dups).
- (Client) remember the last N log IDs received. When reconnecting, check for ID gaps, and ask the server to start sending events at the first gap (e.g., received 1,2,3,6,8,9 → request logs starting at 4). De-duplicate incoming logs against the list of known IDs.
To do "catch up on missed logs" reliably, we seem to need a timestamp or ID column that reflects the order the logs were committed (because that's the order we receive notifications).
PostgreSQL 9.5 has a track_commit_timestamp configuration option, which would let us call txid_current() inside the transaction and pg_xact_commit_timestamp(xid) after the commit. We might be able to store the transaction ID in the logs table, then reconstruct the commit order for a set of recent logs by calling pg_xact_commit_timestamp() on the transaction IDs we see in SELECT results.
With earlier versions we could have a thread (dare I say goroutine?) listening to NOTIFY events and assigning timestamps to them -- possibly saving the timestamps in the logs table -- and telling the websocket threads about the now-reliably-stamped logs (possibly via another NOTIFY?). Clients could safely filter on those timestamps because we could guarantee they stay in order.
We could also consider using a message queue system that makes this easier than Postgres...
Updated by Peter Amstutz over 8 years ago
New branch 9388-websocket-every-notify
8442: Process each notify individually instead attempting to batch them up.
Prior to this commit, websockets used to try to send log events out in batches,
by getting all logs with an id greater than last log that was sent.
Unfortunately, under concurrent database writes, logs from uncommited
transactions may not appear in the query even if logs with larger ids do
appear. This results in the uncommitted log never being sent out because
subsequent batch sends would not consider logs prior to the last log id that
was sent (which, in this case, is higher than the log that was missed.)
This commit eliminates the batching behavior. Because NOTIFY includes the log
id of a specific record that was committed, consider only the log record with
that id and process events in the order that the NOTIFY events arrive. This
means events may be delivered out of numeric order (although they now more
closely reflect the "actual" order, e.g. the order that the events were
actually committed to the database).
"Catch ups" where the client has specified a last_log_id and needs to have past
logs replayed continue to be sent in batches.
Updated by Tom Clegg over 8 years ago
Pls rebase on master (9388-websocket-every-notify seems to be on an unmerged 8442 branch, says 8442 in commit message)
WDYT of the "check the IDs against the initial batch" idea in note-9? It seems like without that, we'll send two copies of events that arrive after we start listening for notifications but before we finish the "catch-up" phase.
Or maybe it's not a problem for clients to receive duplicates? In that case commit 40a4f73 (once extracted from 8442) LGTM.
Updated by Peter Amstutz over 8 years ago
Tom Clegg wrote:
The after_save → after_commit change LGTM.
After reviewing the documentation, this is unnecessary and possibly a little bit counter productive. If NOTIFY is issued in a transaction, it is held until after the transaction completes (and doesn't go out at all if the transaction is rolled back). It is possibly counter productive because if the NOTIFY is part of the transaction, it goes into the Postgres notify queue as part of finalizing the transaction, whereas if it comes after the transaction, there's an extra round trip between postgres and rails which means the NOTIFY goes out slightly after the transaction.
Updated by Tom Clegg over 8 years ago
Agreed, changing to after_commit is pointless: nobody gets notified until after the transaction anyway.
Updated by Peter Amstutz over 8 years ago
Tom Clegg wrote:
Pls rebase on master (9388-websocket-every-notify seems to be on an unmerged 8442 branch, says 8442 in commit message)
Whoops, I eff'd that up. (This whole issue came up because it affects cwl-runner getting critical notifications like when containers are finished, so I was bouncing between branches). Will fix.
WDYT of the "check the IDs against the initial batch" idea in note-9? It seems like without that, we'll send two copies of events that arrive after we start listening for notifications but before we finish the "catch-up" phase.
Or maybe it's not a problem for clients to receive duplicates? In that case commit 40a4f73 (once extracted from 8442) LGTM.
It's an edge case and I'm trying to avoid overcomplicating the fix, but I can think of situations where it could go wrong, so it's worth addressing.
Updated by Peter Amstutz over 8 years ago
@ 65721c0
Now records every id that was sent, so they only get sent once. This modifies existing behavior, if additional subscribe messages come in with a last_log_id set, previously it would replay all messages from that last_log_id. Now it will only replay the ones that were not previously sent. This is probably a better behavior overall as it makes it more practical to update dynamically update subscriptions.
Storing all sent message ids for a connection could eventually be a problem if there are hundreds of persistent collections receiving hundreds of thousands of logs, but scalability issues at that point will probably be better addressed by a rewrite/redesign of the event bus anyway.
Updated by Brett Smith over 8 years ago
Tom Clegg wrote:
PostgreSQL 9.5 has a track_commit_timestamp configuration option, which would let us call txid_current() inside the transaction and pg_xact_commit_timestamp(xid) after the commit. We might be able to store the transaction ID in the logs table, then reconstruct the commit order for a set of recent logs by calling pg_xact_commit_timestamp() on the transaction IDs we see in SELECT results.
I really don't want to introduce a dependency on an entire SQL database that's only six months old. It's not in any of our supported distributions, so it would add substantial work to a traditional install process. We're also starting to see more deployments where the database lives separate from the cluster ("use this database my IT department already runs," or "use this database in in the cloud"), and imposing very strict version requirements on those could really throw a wrench in their works.
Updated by Tom Clegg over 8 years ago
Peter Amstutz wrote:
Storing all sent message ids for a connection could eventually be a problem if there are hundreds of persistent collections receiving hundreds of thousands of logs, but scalability issues at that point will probably be better addressed by a rewrite/redesign of the event bus anyway.
The reason I suggested "check the IDs against the initial batch" (instead of against everything ever sent) is that it avoids this problem of using unbounded memory for every connection. Is there any reason to de-dup postgres notifications, too? Or can we just fix this with
if getting_initial_backlog add id to "sent" set else check "sent" set before sending end
Updated by Peter Amstutz over 8 years ago
Tom Clegg wrote:
The reason I suggested "check the IDs against the initial batch" (instead of against everything ever sent) is that it avoids this problem of using unbounded memory for every connection. Is there any reason to de-dup postgres notifications, too?
Note that it is still possible to replay the world by starting from id 1, which would fill up sent_ids
.
Changes:
- Only record ids sent during "catch up" in
sent_ids
- Test outgoing messages against
sent_ids
to avoid duplicates - Add
ws.notify_queue
to ensure that notifies occurring during "catch up" will get delivered in the correct order after catch up completes.
Note that a "subscribe" message which sets last_log_id
on a connection where messages have already been delivered may return duplicate events. This maintains the existing behavior.
Updated by Peter Amstutz over 8 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:2cd1c3ed705e639fb9e4ef067a32b278a6d3d4ee.