Project

General

Profile

Actions

Bug #12684

open

[Python SDK] Retry on HTTP 5xx errors

Added by Tom Morris over 6 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-
Release:
Release relationship:
Auto

Description

This sounds like what #3147 was intended to address, but it's apparently not working:

Traceback (most recent call last):
  File "./myg_runs.py", line 244, in <module>
    main()
  File "./myg_runs.py", line 230, in main
    dump_subprojects(stats, project, SKIP_PROJECTS)
  File "./myg_runs.py", line 210, in dump_subprojects
    dump_pipeline_instances(stats, sp)
  File "./myg_runs.py", line 182, in dump_pipeline_instances
    time = dump_pipeline_instance(stats, i)
  File "./myg_runs.py", line 167, in dump_pipeline_instance
    dump_jobs(batchid, sample, cwl_runner['job']['components'])
  File "./myg_runs.py", line 84, in dump_jobs
    jobs = api.jobs().list(filters=[['uuid','=',job_uuid]]).execute()
  File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/googleapiclient/http.py", line 840, in execute
    raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 502 when requesting https://e51c5.arvadosapi.com/arvados/v1/jobs?alt=json&filters=%5B%5B%22uuid%22%2C+%22%3D%22%2C+%22e51c5-8i9sb-b8od8nvombxq3h3%22%5D%5D returned "Bad Gateway">

Related issues

Related to Arvados - Bug #3147: [SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips.ResolvedBrett Smith08/22/2014

Actions
Actions #1

Updated by Tom Morris over 6 years ago

  • Related to Bug #3147: [SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips. added
Actions #2

Updated by Lucas Di Pentima over 6 years ago

It seems that the api client object already has a default retry value of 2 (https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L33), and the retry code may be missing some exception catching:

https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L69-L101

Actions #3

Updated by Peter Amstutz over 6 years ago

  • Tracker changed from Feature to Bug
Actions #5

Updated by Tom Morris about 5 years ago

The current SDK defaults are 2 retries with an initial sleep period of 2 seconds and a multiplier of 2 which translates to 3 attempts over 6 seconds (at 0, 2, 4 seconds).

Although it doesn't look like we're using it, the Google API client library has retry support built in:

https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-module.html#_retry_request
https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-pysrc.html#HttpRequest.execute

but their algorithm is different due to the use of randomization and a fixed base period and multiplier

       sleep_time = rand() * 2 ** retry_num 

The only indication as to whether retries were attempted is a debug level logging message, so I suggest we upgrade that to warning level, like the Google API client library does. Without that there's no way to tell whether the exception came on the final attempt and wasn't intended to be caught or whether it's a retryable exception that's not being caught for some reason while the retries are still in process.

Actions #6

Updated by Peter Amstutz almost 3 years ago

  • Target version deleted (To Be Groomed)
Actions #7

Updated by Lucas Di Pentima over 1 year ago

  • Release set to 60
Actions

Also available in: Atom PDF