Project

General

Profile

Actions

Bug #17989

closed

PySDK socket read timeout on big collection create calls

Added by Lucas Di Pentima over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
09/07/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

While working on #17948 I found an issue with our Python API client while making collection creation request with big payloads:

Traceback (most recent call last):
  File "tools/test-collection-create/test-collection-create.py", line 165, in <module>
    sys.exit(main())
  File "tools/test-collection-create/test-collection-create.py", line 155, in main
    "manifest_text": manifest
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/googleapiclient/http.py", line 835, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/googleapiclient/http.py", line 179, in _retry_request
    raise exception
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/googleapiclient/http.py", line 162, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/arvados/api.py", line 111, in _intercept_http_request
    return self.orig_http_request(uri, method, headers=headers, **kwargs)
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/httplib2/__init__.py", line 1709, in request
    conn, authority, uri, request_uri, method, body, headers, redirections, cachekey,
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/httplib2/__init__.py", line 1424, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/lucas/venv-arvados/lib/python3.7/site-packages/httplib2/__init__.py", line 1376, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1352, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 271, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

The collection got created but the client failed to wait for the response (it seems that only waits for around 12 seconds no matter which timeout setting is used).

For more information: https://dev.arvados.org/issues/17948#note-8


Subtasks 1 (0 open1 closed)

Task #18033: Review 17989-pysdk-timeoutResolvedPeter Amstutz09/07/2021

Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #17171: [sdk] unreasonable python SDK timeout when ARVADOS_API_HOST is reachableResolvedLucas Di Pentima04/15/2021

Actions
Actions #1

Updated by Lucas Di Pentima over 3 years ago

  • Assigned To set to Lucas Di Pentima
Actions #2

Updated by Lucas Di Pentima over 3 years ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Actions #3

Updated by Lucas Di Pentima over 3 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Lucas Di Pentima over 3 years ago

Updates at ca109d685 - branch 17989-pysdk-timeout
Test run: developer-run-tests: #2680

  • Fixes the arvados.api_from_config() call so that it uses the specified timeout param.
  • Adds test.
  • As a bonus, lowers the max manifest size limit on the test collection creation script to avoid "payload is too large" errors.
Actions #5

Updated by Peter Amstutz over 3 years ago

Lucas Di Pentima wrote:

Updates at ca109d685 - branch 17989-pysdk-timeout
Test run: developer-run-tests: #2680

  • Fixes the arvados.api_from_config() call so that it uses the specified timeout param.
  • Adds test.
  • As a bonus, lowers the max manifest size limit on the test collection creation script to avoid "payload is too large" errors.

That was a unfortunate oversight, but it looks like it was easy to find and fix.

On the subject of timeouts:

We already have feedback that a default timeout 10s is too short (e.g. arv-mount reportedly cannot load very large collections).

Timeouts are one of those things that don't matter until you start hitting them and you actually do need it to wait longer, then they matter a lot.

So I have a couple of follow up suggestions:

  • Raise the timeout to 20 or 30 seconds (let's see what Tom and Ward think)
  • increase the default timeout of arv-mount, and/or add a command line flag that lets you adjust the timeout and/or add an enviroment variable ARVADOS_TIMEOUT or something that lets you adjust the default timeout.
Actions #6

Updated by Peter Amstutz over 3 years ago

I should have said, the 17989-pysdk-timeout branch LGTM.

Actions #7

Updated by Peter Amstutz over 3 years ago

  • Related to Bug #17171: [sdk] unreasonable python SDK timeout when ARVADOS_API_HOST is reachable added
Actions #8

Updated by Peter Amstutz over 3 years ago

(copied from slack)

I don't suppose we have any better granularity?

Initial TCP connection/TLS handshake timeout and subsequent response timeout really ought to be different things.

This is one of the reasons we use pycurl for keepclient, because it gave us a more knobs to control exactly when to abandon a connection.

Actions #9

Updated by Peter Amstutz over 3 years ago

From discussion: let's revert the timeout to 300 seconds and investigate further if we can separate the connect/handshake timeout from the response timeout.

Actions #10

Updated by Lucas Di Pentima over 3 years ago

Updates at aab565926 - branch 17989-revert-pysdk-default-timeout
Test run: developer-run-tests: #2681

  • Reverts arvados.api()'s default timeout to 300 seconds.
  • Updates test.
Actions #11

Updated by Peter Amstutz over 3 years ago

Lucas Di Pentima wrote:

Updates at aab565926 - branch 17989-revert-pysdk-default-timeout
Test run: developer-run-tests: #2681

  • Reverts arvados.api()'s default timeout to 300 seconds.
  • Updates test.

LGTM

Actions #12

Updated by Lucas Di Pentima over 3 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|834f7c726f727e04250ed6b3eda287ca996e9c38.

Actions #13

Updated by Peter Amstutz over 3 years ago

  • Release set to 43
Actions #14

Updated by Lucas Di Pentima about 3 years ago

  • Release changed from 43 to 42
Actions

Also available in: Atom PDF