Bug #13630
openlocking api_client_authorizations for update can deadlock ContainersController#update
0%
Description
We've observed an apparent deadlock (though not detected by postgres) when a containers update request (ContainersController#update ) calls assign_auth which results in an update_attributes call on api_client_authorizations.
When this happened, the database ended up with a number of queries of the form:
UPDATE "api_client_authorizations" SET "expires_at" = $1, "updated_at" = $2 WHERE "api_client_authorizations"."id" = $3
All of which were waiting on row and/or tuple exclusive locks (i.e. none of those queries were actively running - they were ALL waiting). We let that go for over 45m and saw every new query get stuck in the same situation before we manually killed all of the waiting postgres processes to clear them out.
Partial stack trace (from new relic) leading up to the stuck query:
…/active_record/mass_assignment_security/ persistence.rb: 65:in `block in update' …/active_record/mass_assignment_security/ persistence.rb: 63:in `update' …api-db/arvados-api/current/app/models/ arvados_model.rb: 125:in `update_attributes' …dos-api-db/arvados-api/current/app/models/ container.rb: 492:in `assign_auth' …model-4.2.10/lib/active_model/validations/ callbacks.rb: 113:in `run_validations!' …ems/activemodel-4.2.10/lib/active_model/ validations.rb: 338:in `valid?' …/active_record/mass_assignment_security/ persistence.rb: 77:in `block in update!' …/active_record/mass_assignment_security/ persistence.rb: 75:in `update!' …-api/current/app/controllers/ application_controller.rb: 119:in `update' …nt/app/controllers/arvados/v1/ containers_controller.rb: 30:in `block in update' …nt/app/controllers/arvados/v1/ containers_controller.rb: 27:in `update' …-api/current/app/controllers/ application_controller.rb: 398:in `block in set_current_request_id' …-api/current/app/controllers/ application_controller.rb: 397:in `set_current_request_id' ….0/gems/actionview-4.2.10/lib/action_view/ rendering.rb: 30:in `process' …uby/2.3.0/gems/omniauth-1.4.3/lib/omniauth/ strategy.rb: 186:in `call!' …uby/2.3.0/gems/omniauth-1.4.3/lib/omniauth/ strategy.rb: 164:in `call' …ruby/2.3.0/gems/omniauth-1.4.3/lib/omniauth/ builder.rb: 63:in `call' …or_bundle/ruby/2.3.0/gems/rack-1.6.10/lib/rack/ etag.rb: 24:in `call' …ruby/2.3.0/gems/rack-1.6.10/lib/rack/ conditionalget.rb: 38:in `call' …or_bundle/ruby/2.3.0/gems/rack-1.6.10/lib/rack/ head.rb: 13:in `call' …2.3.0/gems/rack-1.6.10/lib/rack/session/abstract/ id.rb: 225:in `context' …2.3.0/gems/rack-1.6.10/lib/rack/session/abstract/ id.rb: 220:in `call' …ms/lograge-0.10.0/lib/lograge/rails_ext/rack/ logger.rb: 15:in `call_app' …ms/request_store-1.4.1/lib/request_store/ middleware.rb: 19:in `call' …ruby/2.3.0/gems/rack-1.6.10/lib/rack/ methodoverride.rb: 22:in `call' …bundle/ruby/2.3.0/gems/rack-1.6.10/lib/rack/ runtime.rb: 18:in `call' …undle/ruby/2.3.0/gems/rack-1.6.10/lib/rack/ sendfile.rb: 113:in `call' …vados-api/current/app/middlewares/ arvados_api_token.rb: 63:in `call' …uby/phusion_passenger/rack/ thread_handler_extension.rb: 97:in `process_request' …by/phusion_passenger/request_handler/ thread_handler.rb: 157:in `accept_and_process_next_request' …by/phusion_passenger/request_handler/ thread_handler.rb: 110:in `main_loop' …/ruby/vendor_ruby/phusion_passenger/ request_handler.rb: 415:in `block (3 levels) in start_threads' /usr/lib/ruby/vendor_ruby/phusion_passenger/ utils.rb: 113:in `block in create_thread_and_abort_on_exception'
Updated by Joshua Randall over 6 years ago
We have mitigated this issue by adding a timeout on all postgres statements - there is no sense in the API server waiting for postgres statements for longer than the request timeout!
We modified the `initialize` method in `models/arvados_model.rb` so that it looks like this:
def initialize raw_params={}, *args ActiveRecord::Base.connection.execute('SET statement_timeout = 300000') super(self.class.permit_attribute_params(raw_params), *args) end